Elasticsearch does not index all documents



I tried to index a CSV document of 10 000 000 of lines in Elasticsearch via Bulk API. I succeed in and it works.
But, in the CSV file, I have several duplicates. So, I decided to put myself the _id of each document. And when I index the CSV document, I don't have 10 000 000 of lines but less -> logical.
What is less logical is that a first indexation gives me 9 989 339 documents, a second indexation 9 278 194, a third indexation 9 584 239 documents ... I never have the same number. Why is not working correctly ? What's wrong with my script ?

(Christian Dahlqvist) #2

Have you run a refresh after completing indexing? Did you see any errors in the responses for the bulk requests?


Yes, I did a refresh after completing indexing and I have no errors in the responses from the bulk requests.

(David Pilato) #4

What is your script?

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.