Understanding "id existence" check mechanism for custom ids

danslapman · June 2, 2025, 1:32pm

Greetings! I'm seeking a way to verify/clarify my understanding on what's going on during indexing a document with custom _id. I know that ES performs an existence check in such case, but I'm curious about details. For my understanding, ES does the following:

detetrmine shard (by hashing id or provided routing key) where document is potentially stored
perform a lookup within that shard

So, roughly my question is "there is no full index scan in such case, right"?

Thanks in advance

Christian_Dahlqvist · June 2, 2025, 1:41pm

That does sound correct. Only the shard the document is being indexed into is searched.

danslapman · June 2, 2025, 1:54pm

Thank you very much for you reply!

DavidTurner · June 2, 2025, 3:03pm

There is not even anything you could call a "full index scan" within the single shard that owns the document ID. It's just checking the terms dictionary -- think something like a B-tree -- so it's only logarithmic effort.

stephenb · June 2, 2025, 5:50pm

Hi @danslapman

At the risk of being "corrected" by @DavidTurner

Note: High-throughput ingest use cases with self-generated IDs with larger shard sizes might negatively affect performance. While using your own _id provides flexibility, it could impact ingest efficiency.... so it is "not free" but as efficient as possible.

The "unusual" part of this is that you might get "uneven" ingest pattern when the shards are brand new ... little to no impact .... when large shards are reaching their limit the impact can become larger... then roll over happens and the pattern resets...

danslapman · June 2, 2025, 6:05pm

Hi @stephenb

Thanks for the explanation!

Yes, I understand that auto-generated ids are more efficient. In my case I have a system that generates ids on application side already and I wanted to check/verify my understanding before I do some optimization steps (we cannot switch to auto-generation quickly)

Topic		Replies	Views
Prevent checking existence of document ID when specified Elasticsearch	4	1447	December 31, 2018
Inserting a document that already exists. Exception? Elasticsearch	7	5718	July 13, 2018
Custom document Id Elasticsearch	2	1383	January 12, 2018
What algorithm is ElasticSearch create Document _Id based on?Could somebody answer me，plz Elasticsearch	3	6982	February 28, 2019
Bad bulk performance with self-generated id Elasticsearch	17	3432	November 9, 2017

Understanding "id existence" check mechanism for custom ids

Related topics