Pertaining missing data in an index

I realize that, of course, Elasticsearch is not an ACID-compliant data store, but I have a question pertaining missing data. Out of 5 or so million documents of relatively small size (300 to 1,000 source bytes per), we've noticed that a small number (perhaps only 100) are simply missing from the index, despite being able to easily see where they were added (without error - also we've been able to re-add the documents from our database source without error, indicating that the source didn't contain anything strange that might have tripped up ES). We haven't had any node failures. Is it possible, in Elasticsearch / Lucene, for some documents to simply be lost under any ordinary circumstance (assuming no node failures or known errors of any kind)?

My understanding is that we should be able to reproduce the index from our database as the source of truth, but that we would only need to expect to do this (or even check for such consistencies) under a failure scenario (lost node, etc.). Is that understanding incorrect?

That sounds strange. Which version of Elasticsearch are you using? What is the size of the cluster? How do you index the data?

Hey @Christian_Dahlqvist,

Thanks for the reply. We are using 7.1, six nodes with 6 shards, each with 2 replicas, totaling around 18GB so far, including replicas. We index the data via python-elasticsearch, and we use custom routing based on the integer company ID owning each document.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.