Index selection optimisation when _index filter is used

vangap · April 3, 2020, 5:21am

If we take a query like below, where the index selection in request path has a wider scope, but we are only interested in results from a subset of the indices.

GET 123-logs-*/_search?size=1
{
    "profile": true,
    "query": {"bool": {"must": [
        {"query_string": {
           "fields": [
              "message"
           ], 
          "query": "*error*"
       }},
        {
           "term": {
              "_index": {
                 "value": "*logs-2020*"
              }
           }
       }
       
    ]}}
}

Technically speaking, index pattern (logs-2020) that is applicable for only the subset of results could have been used in the request path itself avoiding a filter on _index, atleast in this case. But, because of the way we let our users build queries on their data and the way we convert them to ES queries, we end up with these kinds of queries.

The order of this filter in the must clause seems to have some impact on the performance that the query_string filter is being evaluated on the indices that don't match the _index filter pattern.

When i tried to profile this query, I can see that there is a MatchNoDocsQuery being added but also that the query execution is spending time on indices that user is not interested in.

                     {
                        "type": "BooleanQuery",
                        "description": "+message:*error* +MatchNoDocsQuery(\"Index didn't match. Index queried: abcxyz vs. [2a 5f 63 36 33 32 35 62 2d 63 72 30 30 33 34 2d 61 67 69 6c 65 6f 6e 65 5f 70 65 67 61 70 72 6f 6a 6d 67 6d 74 5f 77 6f 72 6b 5f 62 75 67 5f 64 64]\")",
                        "time_in_nanos": 2013705,
                        "breakdown": {
                           "set_min_competitive_score_count": 0,
                           "match_count": 0,
                           "shallow_advance_count": 0,
                           "set_min_competitive_score": 0,
                           "next_doc": 0,
                           "match": 0,
                           "next_doc_count": 0,
                           "score_count": 0,
                           "compute_max_score_count": 0,
                           "compute_max_score": 0,
                           "advance": 0,
                           "advance_count": 0,
                           "score": 0,
                           "build_scorer_count": 3,
                           "create_weight": 6526,
                           "shallow_advance": 0,
                           "create_weight_count": 1,
                           "build_scorer": 2007175
                        },,
                        "children": [
                           {
                              "type": "MultiTermQueryConstantScoreWrapper",
                              "description": "message:*error*",
                              "time_in_nanos": 2001679,
                              "breakdown": {
                                 "set_min_competitive_score_count": 0,
                                 "match_count": 0,
                                 "shallow_advance_count": 0,
                                 "set_min_competitive_score": 0,
                                 "next_doc": 0,
                                 "match": 0,
                                 "next_doc_count": 0,
                                 "score_count": 0,
                                 "compute_max_score_count": 0,
                                 "compute_max_score": 0,
                                 "advance": 0,
                                 "advance_count": 0,
                                 "score": 0,
                                 "build_scorer_count": 3,
                                 "create_weight": 379,
                                 "shallow_advance": 0,
                                 "create_weight_count": 1,
                                 "build_scorer": 2001296
                              }
                           },
                           {
                              "type": "MatchNoDocsQuery",
                              "description": "MatchNoDocsQuery(\"Index didn't match. Index queried: abcxyz vs. [2a 5f 63 36 33 32 35 62 2d 63 72 30 30 33 34 2d 61 67 69 6c 65 73 74 75 64 69 6f 2d 73 74 6e 67 2d 61 67 69 6c 65 73 74 75 64 69 6f 2d 73 74 67 31 2d 63 6c 6f 6e 65 5f 70 65 67 61 70 72 6f 6a 6d 67 6d 74 5f 77 6f 72 6b 5f 62 75 67 5f 64 64]\")",
                              "time_in_nanos": 513,
                              "breakdown": {
                                 "set_min_competitive_score_count": 0,
                                 "match_count": 0,
                                 "shallow_advance_count": 0,
                                 "set_min_competitive_score": 0,
                                 "next_doc": 0,
                                 "match": 0,
                                 "next_doc_count": 0,
                                 "score_count": 0,
                                 "compute_max_score_count": 0,
                                 "compute_max_score": 0,
                                 "advance": 0,
                                 "advance_count": 0,
                                 "score": 0,
                                 "build_scorer_count": 2,
                                 "create_weight": 213,
                                 "shallow_advance": 0,
                                 "create_weight_count": 1,
                                 "build_scorer": 297
                              }
                           }
                        ]

I would have expected that ES to some kind of optimisation here and avoid processing this query against the non-relevant indices when the query is being rewritten internally.

Maybe, I am oversimplifying , so I would like to know if the onus of the optimal index selection in request path and the order of the "_index" filter in the must clause is on the client?

I read this briefly https://www.elastic.co/blog/elasticsearch-query-execution-order, I am not sure if I grasped it completely,
Is ES actually evaluating all the docs for matching term in non-relevant indices also if the _index filter comes after the query_string filter? or it's just spending time in getting index stats for the initial phase of the execution?

Thanks in advance.

Enright · April 3, 2020, 6:08am

The usual analogy when considering column order in composite indexes is that of a phone book. This is ordered by (surname, firstname). This makes look ups by surname straightforward but doesn't help you looking up numbers by forename.

system · May 1, 2020, 6:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch query performance using filter query Elasticsearch	4	4020	December 29, 2017
5.1: "must" or "filter" query with aggregation? Elasticsearch	1	2672	February 13, 2017
Binary string in query ignores the filter part of the query Elasticsearch	1	499	July 5, 2017
Should in filtered query Elasticsearch	9	5357	February 7, 2017
Performance querying time-based indices in a date range Elasticsearch	3	2377	August 3, 2020

Index selection optimisation when _index filter is used

Related topics