Performance of a wildcard index search for _id

Hi all.
I am doing a term query search for _id field (which happens to be a uuid in my case) across all indexes via a wildcard index name search.

e.g.

GET http://elasticsearch.endpoint/prod-*/_search
{
    "query": {
        "term": {
            "_id": "d7e700ff-0c0e-49cb-9e14-b02f4121f3ff"
        }
    }
}

We already have ~20 indexes.. as we grow more clients/customers can imagine hitting 100+ indexes. The response time currently seems to be fast (the "took" prop in the response shows ~50 ms).
But I am wondering what would happen when number of indexes increase? Would performance of this query fall? What is the time-complexity of this search? Or is there nothing to be concerned about?

The nature of this query requires to look for this data in every index, in every shard.

If you have a use-case like a timebased index, or another criteria to limit your searches, this might actually help to select the shards this query should be executed on.

For example using a range query to limit for a timestamp or using the constant keyword field type in combination with a terms filter.

Hope that helps as as start.

--Alex

oh. In every shard? that's a bit of a surprise, as I thought ES can efficiently route an id to a specific shard (I mean the way sharding itself works is based on id).

The idea was to find document with least meta info. It is need for fetching a document, whose id needs to be short enough to fit into a tiny url (like to be part of an sms or whatsapp message), that I decided to skip the index name from the tiny url. So.. I think I might have to refactor this to add the index name, and not do wildcard index search.

By default, you search across every shard involved in a query.

However by using routing, you could reduce this down to the number of primary shards you are hitting (so this only makes sense if you have more than one shard per index). See https://www.elastic.co/guide/en/elasticsearch/reference/7.9/mapping-routing-field.html#_searching_with_custom_routing

The routing you are referring to, by default only happens when a document is indexed, to pick the shard.

Hope that makes sense.

1 Like

I just discovered the ids query -> https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-ids-query.html

Would this be the same in perf?

yes, can be optimized with adding the routing parameter as well.

Ok so this query seems to hit exactly one shard per index:

GET http://elasticsearch.endpoint/prod-*/_search?routing=b4716a3f-70e0-46b0-b55f-b6cbdee670d2
{
  "profile": true,
  "query": {
    "ids": {
      "values" : ["b4716a3f-70e0-46b0-b55f-b6cbdee670d2"]
    }
  },
  "size": 1
}

Unlike what the documentation seems to suggest, I didn't have to add routing parameter explicitly while indexing the documents.

The document id is used to rout by default.

1 Like

By default the document id is used for the routing as Christian (hey! :slight_smile: ) mentioned.There are use-cases like the join field type where this needs to be set and you can configure the mapping to force the index operation to contain an explicit routing id - this is disabled by default though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.