I have elasticsearch 2.0.0 set up with kafka elasticsearch connector. I recently discovered I am loosing data. I can't see some old data anymore. Data is lost in a FIFO way.
At first, I thought it was a heap error and increased my heap size but it is still happening.
Has anyone experienced this? Any pointers on what to do?
I search for the data based on the period and can't find it. I also have a graphical display of the data on a webpage so that was what alerted me to this
I can view the data as it comes in via a graph (I have a graph aggregated by day created with d3), after a while I notice days that formerly had data are empty. So I check using sense and I find that all data or most times some part of that data is missing. This is data I verified formerly was there.
The index stats shows that you have deleted documents in your index. This may indicate that you have indexed multiple documents with the same ID, which would cause an update (delete + creation of new document) or that you have deleted documents by some other means, e.g. through the delete-by-query API. Elasticsearch does not by itself delete data.
If you have the ID of a document that you would expect to find in the period for which data is no longer found you could look this up and see if it has been updated or deleted.
As you are using such an old version of Elasticsearch, it just struck me that it might be possible that TTL is enabled, which would cause documents to get deleted. It was listed as deprecated in 2.0, but may still have been available. Get the settings for the index through the get index settings API and also check the mappings for the index and look for any ttl related settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.