Im a bit confused on JDBC Indexes/document_types

matt40413 · March 8, 2017, 4:46pm

So im a bit new to this admittedly, and i've been reading at exactly what "indexes" are and it mostly makes sense. However as im just using this with the logstash JDBC plugin I only really am referencing JDBC indexes.

When I define a logstash config file using the Elasticsearch plugin I will have something like this:

output {
    elasticsearch {
        index => "sensors"
        document_type => "sensor"
        document_id => "%{sensor_id}"
        hosts => "localhost:9200"
    }
}

(This in particular is for a Postgres DB that has weather sensor data (temps/humidity/etc..). Im not sure I understand what "Index/document_type/document_id mean in this instance.

I understand what an index is in the "scheme" of Elasticsearch itself, but maybe im just not sure of the concept in this scenario, not am I sure of the right conventions exactly on naming these things? Im really just going to be looking at one or two tables in this scenario.

Can anyone explain maybe exactly what these mean? If I wanted to search multiple tables would I need multiple outputs? or just multiple types?

dadoonet · March 8, 2017, 5:03pm

You can think about index as a schema, type as a table and id is like a primary key.

If you have 2 tables which represent 2 different type of objects (I mean no relationship), then yes you need to index both.
If field names are shared, then you will be able to search for the common fields whatever the type is.

matt40413 · March 8, 2017, 5:06pm

That makes sense, were going to be using Kibana to do Visualized Searching....but Since Kibana pulls in all the data (my sql statement is just select * from sensor_table) I guess im not sure the point of having different types/id's...etc..

Since Kibana can just search through any of that? I guess maybe for me I don't understand how splitting them up helps. Im basically wanting to just be able to access various column data from 1 table (possibly 2 in the future). So do the types/id's really matter in that case?

I noticed in the MusicBrainz demo here:

they don't define anything on the elasticsearch output in the logstash config file besides the protocol.

dadoonet · March 8, 2017, 5:22pm

You don't have to set a type if you don't want to. It will default to something.
May be share what your tables look like at the moment (schema and some data) so I'll understand better what you want to do?

matt40413 · March 8, 2017, 5:33pm

Well honestly this is sort of a proof of concept right now, but in general image the table "sensor_data" containing some columns like "sensor_id"/"current_temp"/"current_humidity"/"sensor_battery"/"last_read_timestamp"

With the last_read_timestamp being when the sensor was read. But thats the basics of the important columns we'd want to be able to look at in Kibana.

I guess my question is, i've seen some configurations that leave out the index/type and id in the elastcisearch output configuration. I understand index now (as thats just the name of the index where this data is being updated) and document_type makes sense I suppose (especially in the case where i'd have 2 separate tables i'd want to probably separate the types i assume)

However the document_id (which in this case I have set to %{sensor_id}. Is this worth setting if I just want to keep my index/data in Sync with my database? Instead of creating new Indexes/Datasets each time something changes with my database? or is that incorrect?

I really have no need to have multiple Indexes, i really just want the database to be in sync with the index so we have a constantly updated and searchable index via Kibana.

hopefully that sheds a little light onto what im "attempting" to do haha.

dadoonet · March 8, 2017, 7:17pm

Yes. If you wish to update the same document, which means here having only one document per sensor, then setting this id is correct.

But I think I'd not not do that. If you create instead a new document every time you have a new version of your sensor values, then you'll be able to display sensor values over time which could be very nice.

matt40413 · March 14, 2017, 2:21pm

I guess the issue is, the sensor values are going to be changing a lot ....so i'd imagine a lot of data could be created quite often. So i'd worry about just the amount of data being produced.

When we talk about documents in this case are we talking about multiple indexes?

Also when people refer to logstash JDBC not updating on Delete's but only on inserts and selects im a bit confused at what that means? Are they saying the indexes themselves are not updating on Delete's from the DB (IE if I delete a row of data from the database, then the index document will not change)

this whole index thing I really need to wrap my head around ha!

dadoonet · March 14, 2017, 8:27pm

Well. Elasticsearch is designed for that. We have users having 1 trillion of docs now.

Not necessarily. It depends. If you have different kind of data then yes, might be multiple indices.

About LS JDBC, well. If you remove a document from your DB, how LS can be aware of it when it runs a SQL Query?

system · April 11, 2017, 8:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to have a custom document_id in Logstash? Logstash	3	16545	January 17, 2017
Logstash jdbc input plugin multiple tables on elasticsearch output plugin Logstash	3	7298	June 2, 2017
Logstash JDBC w/Elasticsearch: Managing space? Elasticsearch	3	715	May 17, 2017
Multiple JDBC input for different tables and output into separate indexes Logstash	2	170	August 28, 2023
Multiple tables as input. Multiple pipelines? Logstash	5	1739	April 4, 2018

Im a bit confused on JDBC Indexes/document_types

Related topics