Performance of a wildcard index search for _id

Munawwar1 · September 25, 2020, 8:35am

Hi all.
I am doing a term query search for _id field (which happens to be a uuid in my case) across all indexes via a wildcard index name search.

e.g.

GET http://elasticsearch.endpoint/prod-*/_search
{
    "query": {
        "term": {
            "_id": "d7e700ff-0c0e-49cb-9e14-b02f4121f3ff"
        }
    }
}

We already have ~20 indexes.. as we grow more clients/customers can imagine hitting 100+ indexes. The response time currently seems to be fast (the "took" prop in the response shows ~50 ms).
But I am wondering what would happen when number of indexes increase? Would performance of this query fall? What is the time-complexity of this search? Or is there nothing to be concerned about?

spinscale · September 28, 2020, 9:03am

The nature of this query requires to look for this data in every index, in every shard.

If you have a use-case like a timebased index, or another criteria to limit your searches, this might actually help to select the shards this query should be executed on.

For example using a range query to limit for a timestamp or using the constant keyword field type in combination with a terms filter.

Hope that helps as as start.

--Alex

Munawwar1 · September 29, 2020, 3:09pm

oh. In every shard? that's a bit of a surprise, as I thought ES can efficiently route an id to a specific shard (I mean the way sharding itself works is based on id).

The idea was to find document with least meta info. It is need for fetching a document, whose id needs to be short enough to fit into a tiny url (like to be part of an sms or whatsapp message), that I decided to skip the index name from the tiny url. So.. I think I might have to refactor this to add the index name, and not do wildcard index search.

spinscale · September 30, 2020, 8:45am

By default, you search across every shard involved in a query.

However by using routing, you could reduce this down to the number of primary shards you are hitting (so this only makes sense if you have more than one shard per index). See https://www.elastic.co/guide/en/elasticsearch/reference/7.9/mapping-routing-field.html#_searching_with_custom_routing

The routing you are referring to, by default only happens when a document is indexed, to pick the shard.

Hope that makes sense.

Munawwar1 · September 30, 2020, 3:56pm

I just discovered the ids query -> https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-ids-query.html

Would this be the same in perf?

spinscale · October 1, 2020, 7:18am

yes, can be optimized with adding the routing parameter as well.

Munawwar1 · October 11, 2020, 10:27am

Ok so this query seems to hit exactly one shard per index:

GET http://elasticsearch.endpoint/prod-*/_search?routing=b4716a3f-70e0-46b0-b55f-b6cbdee670d2
{
  "profile": true,
  "query": {
    "ids": {
      "values" : ["b4716a3f-70e0-46b0-b55f-b6cbdee670d2"]
    }
  },
  "size": 1
}

Unlike what the documentation seems to suggest, I didn't have to add routing parameter explicitly while indexing the documents.

Christian_Dahlqvist · October 11, 2020, 10:34am

The document id is used to rout by default.

spinscale · October 14, 2020, 7:33am

By default the document id is used for the routing as Christian (hey! ) mentioned.There are use-cases like the join field type where this needs to be set and you can configure the mapping to force the index operation to contain an explicit routing id - this is disabled by default though.

system · November 11, 2020, 7:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ids query takes up to 1 min on cloud elastic Elasticsearch	3	849	July 5, 2017
Should I route "get by _id" queries to improve performance? Elasticsearch	2	349	July 23, 2021
What's faster /index/_doc/id or /index/_search with id query Elasticsearch	4	1765	January 8, 2020
Ids Query is slow in 2b docs Elasticsearch	2	820	September 16, 2018
Improve indexing performance speed by routing to a specific shard Elasticsearch	8	222	March 27, 2023

Performance of a wildcard index search for _id

Related topics