Jdbc input plugin and sql_last_value

rubpa · August 24, 2018, 8:30am

I'm considering replicating a MySQL database into elasticsearch to enable Kibana visualizations on that data. There's another application that runs off the database and the existing data gets updated.

It appears that logstash + Jdbc input plugin is the best way for this. I found sql_last_value option that can be used to detect and incrementally update the elastic data. However, my database does not have any column to indicate the updated records.

If the plugin query is just SELECT * FROM my_table, would these incremental updates work? In other words, does the plugin or MySQL have any inbuilt feature that allows the plugin to figure out updated records?

magnusbaeck · August 27, 2018, 8:14pm

does the plugin or MySQL have any inbuilt feature that allows the plugin to figure out updated records?

No.

What you might be able to do is fetch all rows and store them in ES with the same document id each time, i.e. so you'll be overwriting the same documents over and over again. It's clearly inefficient (perhaps prohibitively so), but if there's no way to figure out the modified rows it's the best you can do.

rubpa · August 28, 2018, 7:33am

So, as I understand, if I use the primary key id of my table as the document_id in the elasticsearch output, it would overwrite all documents every time it runs (as per the schedule). Anyway the table size is about 32MB as per MySQL with about 30k records. In my opinion, with a 5 min schedule, this amount of data should be tiny for my single-node elastic cluster.

Of course, this solution would not scale if I also want other tables - I do have one with 3 million records.

magnusbaeck · August 28, 2018, 7:33am

Yes, your understanding is correct.

rubpa · August 28, 2018, 7:59am

I do have the option to modify the database. I came across this and this which suggests using an updatedAt field.

If I do setup my table as suggested, I guess my query would look like SELECT * FROM my_table WHERE updatedAt > :sql_last_value ORDER BY updatedAt. Can you confirm the following? It is not very obvious in the documentation of the plugin.

that the ORDER BY clause is compulsory
using sql_last_value in the query means use_column_value and tracking_column become mandatory

magnusbaeck · August 28, 2018, 8:39am

that the ORDER BY clause is compulsory

Yes.

using sql_last_value in the query means use_column_value and tracking_column become mandatory

Yes.

system · September 25, 2018, 8:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does sql_last_value parameter work in jdbc input plugin? Logstash	9	6310	October 24, 2017
Jdbc input plugin full and incremental updates Logstash	3	444	October 25, 2019
Achieve incremental query for Elasticsearch Input Plugin Logstash	1	757	February 7, 2019
JDBC Plugin - sql_last_value equals a not existing timestamp Logstash	3	2668	September 28, 2017
Logstash JDBC input plugin Logstash	4	1299	June 30, 2017

Jdbc input plugin and sql_last_value

Related topics