I have created a logstash pipeline that fetches data from a database (MySQL) and load it into ES.
So it basically consists of a select over a couple of tables and output it to ES. I have noticed that lately this is taking longer and longer so I guess it's fetching all rows and trying to insert new and old entries into ES.
If that's true I need to find a mechanism to avoid fetching entries that are already in ES. What's the most usual pattern for this? I have been thinking about two alternatives:
New column in database table to indicate whether it has been loaded in ES or not
Keep a datetime in ES or my database in order to fetch this value and use it in the input query
Im not sure about any of those because they make assumptions or modify the stored data. What do you think? Do you have any other alternative?