I am just about to start importing SQL Dumps and API responses from one of the systems. But I have realised that those data dumps and API responses will contain same information all the time, but also:
some of the data might get updated (like tables with user details and last_login_time)
some of the data might get removed (user has removed their account)
some of the data might be added (new users added).
How do I handle this in ES? sincedb_path is not helpful at all , this is useful for streaming data only. Even if it once detected that SQL Dump had only one record, filters failed because logstash tried to use filters on the new data only. Why? Because dump is in JSON format, and filters run 'split' module first which obviously doesn't work with just that tiny piece of data which has changed.
Find some unique, but static values to stitch together to form a _id and then use that for the doc. Then if an update occurs it will just update (overwrite) the existing document.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.