Should I route "get by _id" queries to improve performance?

I have an Elasticsearch index with hundreds of millions of documents that is used mostly for the "get by id" queries.

I consider adding a route to documents when indexing. The route will be a random number from 0 to 9.

Later this route will be in an URL with the document id, it that will be used to get the document. Currently I have only document ids in the URLs but plan to add routes as well. The new URL will look like this https://tarta.ai/ j/ [route] /[doc id].

I'm wondering will it decrease the time needed to find a document in the index? My suggestion is that Elasticsearch in this case won't look for a document in all the shards but instead will look only in the shards with this particular route.

Some specs:

  • index size is 110gb.
  • the number of docs is 36m but we're adding hundreds of thousands every day.
  • 5 shards.
  • 16GB RAM and 2 core vm with a 1t SSD.

The document ID defines which shard the document resides in so if you do a GET <index>/<type>/<id> only a single shard will be queried. Your approach will therefore as far as I can see add complexity while not improving performance at all.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.