I tried to index a CSV document of 10 000 000 of lines in Elasticsearch via Bulk API. I succeed in and it works.
But, in the CSV file, I have several duplicates. So, I decided to put myself the _id of each document. And when I index the CSV document, I don't have 10 000 000 of lines but less -> logical.
What is less logical is that a first indexation gives me 9 989 339 documents, a second indexation 9 278 194, a third indexation 9 584 239 documents ... I never have the same number. Why is not working correctly ? What's wrong with my script ?