Logstash JDBC input plugin

asharma · June 1, 2017, 10:10pm

I am using the jdbc plugin to import data form AMAZON redshift to elasticsearch using logstash.

I am processing incremental updates for a very big table which adds around 2 million rows every hour and has a timestamp attached to each row.

I am facing a problem where since data from redshift is not coming in sorted order, in order to process batch update, using :sql_last_value i have to filter latest 2 million row and then sort it which is taking a lot of time.

Is there any work around for this problem so that the sql_last_value stores the max of the current processed batch rather than storing the last value which requires the input to be sorted on that column assigned to sql_last_value ?

magnusbaeck · June 2, 2017, 5:15am

I am facing a problem where since data from redshift is not coming in sorted order, in order to process batch update, using :sql_last_value i have to filter latest 2 million row and then sort it which is taking a lot of time.

Can't you let the jdbc input run more often than once an hour so that each batch becomes smaller?

Is there any work around for this problem so that the sql_last_value stores the max of the current processed batch rather than storing the last value

Sorry, I don't understand the difference.

which requires the input to be sorted on that column assigned to sql_last_value ?

If you're only using a timestamp from a column to keep track of what has been processed I don't see how you can possibly avoid sorting the rows before processing them.

asharma · June 2, 2017, 3:16pm

For the second part, since the rows returned are not in sorted order, what value does :sql_last_value srore for the timestamp column assigned to it ? Will it be the timestamp of the last processed row (which might not be the latest time stamp because of redshift) or will it store the maximum of the timestamps processed in the current batch ??

guyboertje · June 2, 2017, 3:57pm

Its the timestamp of the last processed row. You need to sort it.

Does Magnus' suggestion of more frequent scheduling not work for you?

system · June 30, 2017, 3:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unnecessary hits from logstash to elastic index Logstash	9	717	September 27, 2019
How should I use sql_last_value in logstash? Logstash	11	27181	July 6, 2017
Jdbc input plugin, sql_last_value is always 0 (or 1 January 1970) Logstash	2	4001	February 20, 2018
Jdbc input plugin and sql_last_value Logstash	6	1713	September 25, 2018
How does sql_last_value parameter work in jdbc input plugin? Logstash	9	6392	October 24, 2017

Logstash JDBC input plugin

Related topics