Hello, please help, I inserted data with spark several times but the count was all the time bigger than expected, how it can be?
The version of ES is 8.5.0
.
The query that I used to check: GET index/_count
.
How much bigger it is?
Are you setting the id of each document or let Elasticsearch create one?
Did you run the job multiple time or are you cleaning the index before starting a new job?
- ES shows me 117 744 651 but in Spark, I have 115 925 542;
- ES creates id;
- For sure, each time before a new job I made
DELETE index
.
IMO something like a "retry" or anything like that is happening. As Elasticsearch generates the id, it creates duplicates.
You should if possible, set the id from your documents which would avoid having duplicates.
Thank you, that was the case.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.