Wrong documents' count after inserting

Gregory_Kovalchuk · April 19, 2023, 9:16am

Hello, please help, I inserted data with spark several times but the count was all the time bigger than expected, how it can be?
The version of ES is 8.5.0.
The query that I used to check: GET index/_count.

dadoonet · April 19, 2023, 10:51am

How much bigger it is?
Are you setting the id of each document or let Elasticsearch create one?
Did you run the job multiple time or are you cleaning the index before starting a new job?

Gregory_Kovalchuk · April 19, 2023, 11:40am

ES shows me 117 744 651 but in Spark, I have 115 925 542;
ES creates id;
For sure, each time before a new job I made DELETE index.

dadoonet · April 19, 2023, 4:59pm

IMO something like a "retry" or anything like that is happening. As Elasticsearch generates the id, it creates duplicates.

You should if possible, set the id from your documents which would avoid having duplicates.

Gregory_Kovalchuk · April 27, 2023, 8:07am

Thank you, that was the case.

system · May 25, 2023, 8:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Weird behavior when indexing from spark Elasticsearch es-hadoop	1	691	May 16, 2017
Wrong values in _count API in ES 5.3.0? Elasticsearch	4	1073	May 11, 2017
Elasticsearch not showing correct count of documents in index Elasticsearch	10	486	June 13, 2023
Duplicate insertions for a single document Elasticsearch es-hadoop	4	731	July 6, 2017
Elastic Search : Wrong document count of a index every time Elasticsearch	1	1005	December 11, 2017

Wrong documents' count after inserting

Related topics