Performance - Querying against _id versus _source content

Hi,

I have read that indexing is faster when no id is specified (elasticsearch does not have to check for duplicate).
However, is it relevant to index with a chosen _id if this document has to be retrieved multiple times in the future? Is it faster to get a document by _id rather than having an "id" field in the _source section ?

Thanks,

edit: I cannot add a tag like "performance" therefore I put that in the title :confused:

It's faster to GET a document directly by ID rather than having to search for it. Predominantly because we can go directly to the appropriate shard and lookup the document, whereas a search has to touch all the shards in parallel and lookup the term to find the document. It probably won't be exceptionally slow, but the get-by-ID should always be faster.

I wouldn't worry too much about the performance of autogenerated ID vs user-defined ID. There's a bit of a difference, but it isn't immense. I tell people that if you have a natural ID for a document... use that because it's likely you'll want to get-by-ID at some point. But if there's no natural ID, then go ahead and use the autogenerated version.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.