Im a bit confused on JDBC Indexes/document_types

So im a bit new to this admittedly, and i've been reading at exactly what "indexes" are and it mostly makes sense. However as im just using this with the logstash JDBC plugin I only really am referencing JDBC indexes.

When I define a logstash config file using the Elasticsearch plugin I will have something like this:

output {
    elasticsearch {
        index => "sensors"
        document_type => "sensor"
        document_id => "%{sensor_id}"
        hosts => "localhost:9200"
    }
}

(This in particular is for a Postgres DB that has weather sensor data (temps/humidity/etc..). Im not sure I understand what "Index/document_type/document_id mean in this instance.

I understand what an index is in the "scheme" of Elasticsearch itself, but maybe im just not sure of the concept in this scenario, not am I sure of the right conventions exactly on naming these things? Im really just going to be looking at one or two tables in this scenario.

Can anyone explain maybe exactly what these mean? If I wanted to search multiple tables would I need multiple outputs? or just multiple types?

You can think about index as a schema, type as a table and id is like a primary key.

If you have 2 tables which represent 2 different type of objects (I mean no relationship), then yes you need to index both.
If field names are shared, then you will be able to search for the common fields whatever the type is.

That makes sense, were going to be using Kibana to do Visualized Searching....but Since Kibana pulls in all the data (my sql statement is just select * from sensor_table) I guess im not sure the point of having different types/id's...etc..

Since Kibana can just search through any of that? I guess maybe for me I don't understand how splitting them up helps. Im basically wanting to just be able to access various column data from 1 table (possibly 2 in the future). So do the types/id's really matter in that case?

I noticed in the MusicBrainz demo here:

they don't define anything on the elasticsearch output in the logstash config file besides the protocol.

You don't have to set a type if you don't want to. It will default to something.
May be share what your tables look like at the moment (schema and some data) so I'll understand better what you want to do?

Well honestly this is sort of a proof of concept right now, but in general image the table "sensor_data" containing some columns like "sensor_id"/"current_temp"/"current_humidity"/"sensor_battery"/"last_read_timestamp"

With the last_read_timestamp being when the sensor was read. But thats the basics of the important columns we'd want to be able to look at in Kibana.

I guess my question is, i've seen some configurations that leave out the index/type and id in the elastcisearch output configuration. I understand index now (as thats just the name of the index where this data is being updated) and document_type makes sense I suppose (especially in the case where i'd have 2 separate tables i'd want to probably separate the types i assume)

However the document_id (which in this case I have set to %{sensor_id}. Is this worth setting if I just want to keep my index/data in Sync with my database? Instead of creating new Indexes/Datasets each time something changes with my database? or is that incorrect?

I really have no need to have multiple Indexes, i really just want the database to be in sync with the index so we have a constantly updated and searchable index via Kibana.

hopefully that sheds a little light onto what im "attempting" to do haha.

Yes. If you wish to update the same document, which means here having only one document per sensor, then setting this id is correct.

But I think I'd not not do that. If you create instead a new document every time you have a new version of your sensor values, then you'll be able to display sensor values over time which could be very nice.

I guess the issue is, the sensor values are going to be changing a lot ....so i'd imagine a lot of data could be created quite often. So i'd worry about just the amount of data being produced.

When we talk about documents in this case are we talking about multiple indexes?

Also when people refer to logstash JDBC not updating on Delete's but only on inserts and selects im a bit confused at what that means? Are they saying the indexes themselves are not updating on Delete's from the DB (IE if I delete a row of data from the database, then the index document will not change)

this whole index thing I really need to wrap my head around ha!

Well. Elasticsearch is designed for that. We have users having 1 trillion of docs now.

Not necessarily. It depends. If you have different kind of data then yes, might be multiple indices.

About LS JDBC, well. If you remove a document from your DB, how LS can be aware of it when it runs a SQL Query?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.