Performance - Querying against _id versus _source content

freaka · February 11, 2019, 1:18pm

Hi,

I have read that indexing is faster when no id is specified (elasticsearch does not have to check for duplicate).
However, is it relevant to index with a chosen _id if this document has to be retrieved multiple times in the future? Is it faster to get a document by _id rather than having an "id" field in the _source section ?

Thanks,

edit: I cannot add a tag like "performance" therefore I put that in the title

polyfractal · February 15, 2019, 7:57pm

It's faster to GET a document directly by ID rather than having to search for it. Predominantly because we can go directly to the appropriate shard and lookup the document, whereas a search has to touch all the shards in parallel and lookup the term to find the document. It probably won't be exceptionally slow, but the get-by-ID should always be faster.

I wouldn't worry too much about the performance of autogenerated ID vs user-defined ID. There's a bit of a difference, but it isn't immense. I tell people that if you have a natural ID for a document... use that because it's likely you'll want to get-by-ID at some point. But if there's no natural ID, then go ahead and use the autogenerated version.

system · March 15, 2019, 7:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What's faster /index/_doc/id or /index/_search with id query Elasticsearch	4	1689	January 8, 2020
Difference between GET by document id, and running a query matching on document id? Elasticsearch	2	2515	July 5, 2017
Term query by id or get api? Elasticsearch	2	404	July 6, 2017
Should I route "get by _id" queries to improve performance? Elasticsearch	2	348	July 23, 2021
Is ID query faster than terms? Elasticsearch	2	1306	August 23, 2020

Performance - Querying against _id versus _source content

Related topics