Difference between GET by document id, and running a query matching on document id?


#1

Is there a performance difference when retrieving a document by

  1. GET request by document ID IE:

GET /my_index/my_type/12345

  1. via a _search query IE:

    GET /my_index/my_type/_search 
    {
        "query": {
            "term": {
               "_id": {
                  "value": 12345
               }
            }
       }
    }

(Nik Everett) #2

GET by a document ID is automatically routed to the shard with that document because the routing value comes from the document's ID.

Any query on _id or _uid has to go to all the shards. In fact, _id could be fairly inefficient because it has to be rewritten to a bool query with a bunch of queries on _uid in the should clause. If you query on _uid and add routing to the request you get most of the efficiency back.

One difference is the real time vs not real time aspect. GET by document ID will always return the most recent version of that document we have even if that most recent version isn't in the Lucene index yet. Search will only search what has been refreshed. Exactly how GET by document ID does that actually differs depending on version. In all versions before the very next beta we make of 5.0 the document will be read form the translog if it is not in the lucene index. In the next beta of 5.0 and in Elasticsearch versions build from the 5.0, 5.x, and master branches Elasticsearch will force a refresh to push the document into the Lucene index and then read the Lucene index. The new way is much less efficient but much simpler and allows us to optimize the write path much better.


(system) #3