I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
If you are assigning document IDs based on the data in the database and therefore end up updating documents in Elasticsearch, this will result in the old version of the document being listed as deleted.
If records in the database are getting updated, it looks like you are catching this. Whether you are catching this correctly or not I can't tell as that depends on your data model and how you are extracting changes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.