Elasticsearch not showing correct count of documents in index

Taby · April 27, 2023, 7:26pm

Hi.

Our Java 8 based application is sending an input data of total 17061816 documents to elasticsearch 7.17.4 to index these documents. However, after all indexing is completed, the curl _count is showing a total of 16833817 indexed documents. There are no errors in elasticsearch log as well. I've enabled the DEBUG log at root logger but still no significant error is shown. I am not sure what could be the reason here? The input data is the count of distinct records as well.

Any help is highly appreciated.

leandrojmp · April 27, 2023, 7:34pm

Are you using custom document _id or letting elasticsearch choose the document _id?

Taby · May 16, 2023, 5:11am

Using custom document_id. Still we can't figure out what could be the cause as there are no errors in elastic logs as well as in application logs. The custom_id is a unique default id generated auto generated by Neo4j. We're sending 7500 in a batch for indexing to Bulk Request API. There are no errors or exceptions received from elastic. The application drops all indexes and creates new one on every start and reindex all documents. The repeated process always shows same no. of documents missing in elastic.

Christian_Dahlqvist · May 16, 2023, 5:25am

What does the index stats API show for the index when you have completed indexing? Do you see any evidence of deleted documents, which would indicate that you have had updates occur due to duplicate IDs?

Taby · May 16, 2023, 6:47am

Index stats API is showing total docs count as 16833817 and deleted as 221074. If we add deleted in total count (i.e. 17054891) then still its not the same no. of docs we're sending to elastic which is 17061816.

Also, under indexing --> "index_total" : 16833817.

I've tried to refresh index using Refresh API, but the stats remains same.

Christian_Dahlqvist · May 16, 2023, 6:53am

The number of deleted documents will change as segments are merged so they will not necessarily tally up exactly. This however indicates that your IDs are not unique and that you are seeing updates.

If you change to allow Elasticsearch to set the IDs as a test you should see all documents ingested.

Taby · May 16, 2023, 7:53am

I'm trying it and will share results once indexing is done. Meanwhile, on a separate testing I've reduced the batch size from 7500 to 2000 (on Elasticsearch 6.8.23) and I've got the correct no. of count in elastic 6.8.23 which is 17061816. The issue of mistmatch count is coming on elastic 7.17.4. The dataset and application code is same in both ES 6.x and ES 7.x.

system · May 16, 2023, 7:53am

Elasticsearch 6.8.23 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

Christian_Dahlqvist · May 16, 2023, 8:03am

Did you index into a new index on 6.8.23 or did you use an existing index?

Can you show a sample document?

Taby · May 16, 2023, 10:08am

New index. Every time application starts, it will drop existing indexes and recreate new ones. I test on application restart everytime.

Here is sample document. Data is changed for the privacy purposes.
https://tmpfiles.org/1421890/document.json

I've tested with Elasticsearch ID (not using the custom_id) and the count is still not same. In fact, its 7500 more than what I was previously getting with custom_id. BTW 7500 is a batch size which we're sending to Bulk Request as well. So I think there is something going around with batch size as well.

system · June 13, 2023, 10:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Search : Wrong document count of a index every time Elasticsearch	1	1005	December 11, 2017
Missing documents after a bulk index Elasticsearch	13	3365	July 6, 2017
Documents being deleted after BulkRequest indexing Elasticsearch	13	1591	June 22, 2023
Document lost or not indexed during bulk index Elasticsearch	4	1647	July 23, 2020
Count and Stats api showing different doc's count in Elasticsearch? Elasticsearch	3	1863	August 2, 2017

Elasticsearch not showing correct count of documents in index

Related topics