Index selection optimisation when _index filter is used

If we take a query like below, where the index selection in request path has a wider scope, but we are only interested in results from a subset of the indices.

GET 123-logs-*/_search?size=1
{
    "profile": true,
    "query": {"bool": {"must": [
        {"query_string": {
           "fields": [
              "message"
           ], 
          "query": "*error*"
       }},
        {
           "term": {
              "_index": {
                 "value": "*logs-2020*"
              }
           }
       }
       
    ]}}
}

Technically speaking, index pattern (logs-2020) that is applicable for only the subset of results could have been used in the request path itself avoiding a filter on _index, atleast in this case. But, because of the way we let our users build queries on their data and the way we convert them to ES queries, we end up with these kinds of queries.

The order of this filter in the must clause seems to have some impact on the performance that the query_string filter is being evaluated on the indices that don't match the _index filter pattern.

When i tried to profile this query, I can see that there is a MatchNoDocsQuery being added but also that the query execution is spending time on indices that user is not interested in.

                     {
                        "type": "BooleanQuery",
                        "description": "+message:*error* +MatchNoDocsQuery(\"Index didn't match. Index queried: abcxyz vs. [2a 5f 63 36 33 32 35 62 2d 63 72 30 30 33 34 2d 61 67 69 6c 65 6f 6e 65 5f 70 65 67 61 70 72 6f 6a 6d 67 6d 74 5f 77 6f 72 6b 5f 62 75 67 5f 64 64]\")",
                        "time_in_nanos": 2013705,
                        "breakdown": {
                           "set_min_competitive_score_count": 0,
                           "match_count": 0,
                           "shallow_advance_count": 0,
                           "set_min_competitive_score": 0,
                           "next_doc": 0,
                           "match": 0,
                           "next_doc_count": 0,
                           "score_count": 0,
                           "compute_max_score_count": 0,
                           "compute_max_score": 0,
                           "advance": 0,
                           "advance_count": 0,
                           "score": 0,
                           "build_scorer_count": 3,
                           "create_weight": 6526,
                           "shallow_advance": 0,
                           "create_weight_count": 1,
                           "build_scorer": 2007175
                        },,
                        "children": [
                           {
                              "type": "MultiTermQueryConstantScoreWrapper",
                              "description": "message:*error*",
                              "time_in_nanos": 2001679,
                              "breakdown": {
                                 "set_min_competitive_score_count": 0,
                                 "match_count": 0,
                                 "shallow_advance_count": 0,
                                 "set_min_competitive_score": 0,
                                 "next_doc": 0,
                                 "match": 0,
                                 "next_doc_count": 0,
                                 "score_count": 0,
                                 "compute_max_score_count": 0,
                                 "compute_max_score": 0,
                                 "advance": 0,
                                 "advance_count": 0,
                                 "score": 0,
                                 "build_scorer_count": 3,
                                 "create_weight": 379,
                                 "shallow_advance": 0,
                                 "create_weight_count": 1,
                                 "build_scorer": 2001296
                              }
                           },
                           {
                              "type": "MatchNoDocsQuery",
                              "description": "MatchNoDocsQuery(\"Index didn't match. Index queried: abcxyz vs. [2a 5f 63 36 33 32 35 62 2d 63 72 30 30 33 34 2d 61 67 69 6c 65 73 74 75 64 69 6f 2d 73 74 6e 67 2d 61 67 69 6c 65 73 74 75 64 69 6f 2d 73 74 67 31 2d 63 6c 6f 6e 65 5f 70 65 67 61 70 72 6f 6a 6d 67 6d 74 5f 77 6f 72 6b 5f 62 75 67 5f 64 64]\")",
                              "time_in_nanos": 513,
                              "breakdown": {
                                 "set_min_competitive_score_count": 0,
                                 "match_count": 0,
                                 "shallow_advance_count": 0,
                                 "set_min_competitive_score": 0,
                                 "next_doc": 0,
                                 "match": 0,
                                 "next_doc_count": 0,
                                 "score_count": 0,
                                 "compute_max_score_count": 0,
                                 "compute_max_score": 0,
                                 "advance": 0,
                                 "advance_count": 0,
                                 "score": 0,
                                 "build_scorer_count": 2,
                                 "create_weight": 213,
                                 "shallow_advance": 0,
                                 "create_weight_count": 1,
                                 "build_scorer": 297
                              }
                           }
                        ]

I would have expected that ES to some kind of optimisation here and avoid processing this query against the non-relevant indices when the query is being rewritten internally.

Maybe, I am oversimplifying :slight_smile:, so I would like to know if the onus of the optimal index selection in request path and the order of the "_index" filter in the must clause is on the client?

I read this briefly https://www.elastic.co/blog/elasticsearch-query-execution-order, I am not sure if I grasped it completely,
Is ES actually evaluating all the docs for matching term in non-relevant indices also if the _index filter comes after the query_string filter? or it's just spending time in getting index stats for the initial phase of the execution?

Thanks in advance.

The usual analogy when considering column order in composite indexes is that of a phone book. This is ordered by (surname, firstname). This makes look ups by surname straightforward but doesn't help you looking up numbers by forename.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.