Loosing data in elasticsearch


(Ikenna Darlington) #1

I have elasticsearch 2.0.0 set up with kafka elasticsearch connector. I recently discovered I am loosing data. I can't see some old data anymore. Data is lost in a FIFO way.

At first, I thought it was a heap error and increased my heap size but it is still happening.

Has anyone experienced this? Any pointers on what to do?

I Would appreciate all help.


(Mark Walkom) #2

How do you know it is being lost?


(Ikenna Darlington) #3

I search for the data based on the period and can't find it. I also have a graphical display of the data on a webpage so that was what alerted me to this


(Mark Walkom) #4

And are you sure it's making it to kafka and to Elasticsearch?


(Ikenna Darlington) #5

Yes, the data is already saved and I could see it but after a while it is gone from elasticsearch


(Christian Dahlqvist) #6

Have you by any chance got Curator setup in a cron job to delete old indices? Is there anything in the Elasticsearch logs about indices being deleted?


(Ikenna Darlington) #7

No this is a new setup, I have not set up curator yet. I checked logs but couldn't see anything, I could share my logs if that helps

I installed elasticsearch from the source (apt-get)


(Mark Walkom) #8

Can you explain this more please.


(Christian Dahlqvist) #9

How did you verify the data reached Elasticsearch?


(Ikenna Darlington) #10

This:


(Christian Dahlqvist) #11

Does this mean that you inspected the data through Kibana? Did you query Elasticsearch through the APIs?


(Ikenna Darlington) #12

So I am ingesting data from kafka into elasticsearch using the kafka elasticsearch connector

I can view the data as it comes in via a graph (I have a graph aggregated by day created with d3), after a while I notice days that formerly had data are empty. So I check using sense and I find that all data or most times some part of that data is missing. This is data I verified formerly was there.

Hope this helps?


(Ikenna Darlington) #13

I used sense to search within the period that was missing


(Mark Walkom) #14

Elasticsearch won't just delete the data, so something else must be requesting it.


(Christian Dahlqvist) #15

Are you using time-based indices? What does GET /_cat/indices show?


(Ikenna Darlington) #16
yellow open   grits                     5   1   23408051      4208025      2.7gb          2.7gb 
yellow open   .kibana                   1   1          1            0      2.9kb          2.9kb 
yellow open   test-elasticsearch-sink   5   1          0            0       785b           785b 
yellow open   grit                      5   1          0            0       785b           785b 

grits is the main index


(Christian Dahlqvist) #17

Are you assigning an external ID to the documents you are indexing or are you allowing Elasticsearch to automatically generate an ID?


(Ikenna Darlington) #18

Kafka elasticsearch connector generates ids : http://docs.confluent.io/current/connect/connect-elasticsearch/docs/elasticsearch_connector.html


(Christian Dahlqvist) #19

The index stats shows that you have deleted documents in your index. This may indicate that you have indexed multiple documents with the same ID, which would cause an update (delete + creation of new document) or that you have deleted documents by some other means, e.g. through the delete-by-query API. Elasticsearch does not by itself delete data.

If you have the ID of a document that you would expect to find in the period for which data is no longer found you could look this up and see if it has been updated or deleted.


(Christian Dahlqvist) #20

As you are using such an old version of Elasticsearch, it just struck me that it might be possible that TTL is enabled, which would cause documents to get deleted. It was listed as deprecated in 2.0, but may still have been available. Get the settings for the index through the get index settings API and also check the mappings for the index and look for any ttl related settings.