Bulk inserts more documents than given

Kostyantyn_Dobriohlo · September 4, 2023, 4:24pm

Elasticsearched configured in single-node mode, I have ~1 million elements, but after bulk insert operation I see 10 million elements. I use this python code:

def generate_docs(data):
    for item in data:
        doc = {
            '_index': 'my_index',
            '_source': item
        }

bulk(client, generate_docs)

What a reason of this duplication and is it a real problem?

dadoonet · September 4, 2023, 7:10pm

Where do you see 10m?

Are you using nested field type in your mapping?

Kostyantyn_Dobriohlo · September 5, 2023, 6:06am

I see it when call _cat/indices. And yes, I use nested field type

dadoonet · September 5, 2023, 6:30am

So that's expected.

A nested document is a Lucene document. Which you see in the cat API.
If you run a search, you should get the right number.

system · October 3, 2023, 6:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lucene vs Elastic Search Document Count difference and its impact on term aggregation buckets Elasticsearch	10	572	August 20, 2023
Bulk indexing nested fields and it's effect on queues? Elasticsearch	3	2002	May 26, 2017
Elasticsearch Performance Issue Elasticsearch	7	570	September 4, 2020
Bulk Indexing performance on AWS ES service Elasticsearch	13	2043	November 29, 2017
Single node, large database index performance Elasticsearch	9	591	June 23, 2021

Bulk inserts more documents than given

Related topics