Logstash pipeline filter by date

ezequielo · May 24, 2017, 12:16pm

Hi all,

I have created a logstash pipeline that fetches data from a database (MySQL) and load it into ES.
So it basically consists of a select over a couple of tables and output it to ES. I have noticed that lately this is taking longer and longer so I guess it's fetching all rows and trying to insert new and old entries into ES.

If that's true I need to find a mechanism to avoid fetching entries that are already in ES. What's the most usual pattern for this? I have been thinking about two alternatives:

New column in database table to indicate whether it has been loaded in ES or not
Keep a datetime in ES or my database in order to fetch this value and use it in the input query

Im not sure about any of those because they make assumptions or modify the stored data. What do you think? Do you have any other alternative?

magnusbaeck · May 28, 2017, 6:53pm

The second option is a good one. If you maintain a column with the last modified time of the row Logstash can efficiently ask queries that return only rows modified since the last time.

system · June 25, 2017, 7:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing for only updated data using logstash Logstash	5	834	June 4, 2019
Only import new or updated rows using JDBC Logstash	2	1560	March 23, 2017
SQL Plugin - Last Input Value Tracking Logstash	6	5321	July 6, 2017
Sql_last_value ot working? Logstash	2	594	March 17, 2017
Update data from mysql to elasticsearch Logstash	5	1627	December 7, 2017

Logstash pipeline filter by date

Related topics