Understanding "id existence" check mechanism for custom ids

Greetings! I'm seeking a way to verify/clarify my understanding on what's going on during indexing a document with custom _id. I know that ES performs an existence check in such case, but I'm curious about details. For my understanding, ES does the following:

  1. detetrmine shard (by hashing id or provided routing key) where document is potentially stored
  2. perform a lookup within that shard

So, roughly my question is "there is no full index scan in such case, right"?

Thanks in advance :folded_hands:

That does sound correct. Only the shard the document is being indexed into is searched.

Thank you very much for you reply!

There is not even anything you could call a "full index scan" within the single shard that owns the document ID. It's just checking the terms dictionary -- think something like a B-tree -- so it's only logarithmic effort.

4 Likes

Hi @danslapman

At the risk of being "corrected" by @DavidTurner :slight_smile:

Note: High-throughput ingest use cases with self-generated IDs with larger shard sizes might negatively affect performance. While using your own _id provides flexibility, it could impact ingest efficiency.... so it is "not free" but as efficient as possible.

The "unusual" part of this is that you might get "uneven" ingest pattern when the shards are brand new ... little to no impact .... when large shards are reaching their limit the impact can become larger... then roll over happens and the pattern resets...

1 Like

Hi @stephenb

Thanks for the explanation!

Yes, I understand that auto-generated ids are more efficient. In my case I have a system that generates ids on application side already and I wanted to check/verify my understanding before I do some optimization steps (we cannot switch to auto-generation quickly)

1 Like