Why numbers don't match when profiling an Elasticsearch slow request

Glamdring · September 27, 2020, 7:48am

I'm having some unexpectedly slow queries and so I ran the profiling API (Elasticsearch 7.4). However, it doesn't explain the slowness, as all components take at most a couple of milliseconods, whereas the query "took" more than 1 second: "took" : 1254

Here's the query, which using applicationId for routing (it uses query_string rather than a typical must clause for unrelated reasons, but that doesn't affect the query performance. Note that the index has a configured default sorting on timestamp (desc)):

POST indexname/_search?routing=cbcd0350-ba63-11e9-a4af-ed719166c0ae
{
    "profile": true,
    "query": {
        "bool": {
            "must": {
                "query_string": {
                    "query": "action:foo"
                }
            },
            "filter": [
                {
                    "terms": {
                        "applicationId": [
                            "cbcd0350-ba63-11e9-a4af-ed719166c0ae"
                        ]
                    }
                },
                {
                    "range": {
                        "timestamp": {
                            "gte": "1601142184297",
                            "lte": "1601143384297"
                        }
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "timestamp": {
                "order": "desc"
            }
        }
    ]
}

Below is the profile result:

"profile" : {
    "shards" : [
      {
        "id" : "[9pyht_PVS0mTX_qoJMGhqg][indexname][12]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "+action:foo #ConstantScore(applicationId:cbcd0350-ba63-11e9-a4af-ed719166c0ae) #timestamp:[1601142184297 TO 1601143384297]",
                "time_in_nanos" : 9193115,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 4475919,
                  "match" : 0,
                  "next_doc_count" : 5994,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 243183,
                  "advance_count" : 18,
                  "score" : 0,
                  "build_scorer_count" : 38,
                  "create_weight" : 75323,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 4392639
                },
                "children" : [
                  {
                    "type" : "TermQuery",
                    "description" : "action:foo",
                    "time_in_nanos" : 818107,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 605683,
                      "advance_count" : 6012,
                      "score" : 0,
                      "build_scorer_count" : 56,
                      "create_weight" : 24653,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 181702
                    }
                  },
                  {
                    "type" : "ConstantScoreQuery",
                    "description" : "ConstantScore(applicationId:cbcd0350-ba63-11e9-a4af-ed719166c0ae)",
                    "time_in_nanos" : 1548337,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 1388326,
                      "advance_count" : 6012,
                      "score" : 0,
                      "build_scorer_count" : 54,
                      "create_weight" : 8210,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 145734
                    },
                    "children" : [
                      {
                        "type" : "TermQuery",
                        "description" : "applicationId:cbcd0350-ba63-11e9-a4af-ed719166c0ae",
                        "time_in_nanos" : 704814,
                        "breakdown" : {
                          "set_min_competitive_score_count" : 0,
                          "match_count" : 0,
                          "shallow_advance_count" : 0,
                          "set_min_competitive_score" : 0,
                          "next_doc" : 0,
                          "match" : 0,
                          "next_doc_count" : 0,
                          "score_count" : 0,
                          "compute_max_score_count" : 0,
                          "compute_max_score" : 0,
                          "advance" : 593783,
                          "advance_count" : 6012,
                          "score" : 0,
                          "build_scorer_count" : 54,
                          "create_weight" : 4011,
                          "shallow_advance" : 0,
                          "create_weight_count" : 1,
                          "build_scorer" : 100953
                        }
                      }
                    ]
                  },
                  {
                    "type" : "IndexOrDocValuesQuery",
                    "description" : "timestamp:[1601142184297 TO 1601143384297]",
                    "time_in_nanos" : 4533095,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 542974,
                      "match" : 0,
                      "next_doc_count" : 5994,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 212511,
                      "advance_count" : 1996,
                      "score" : 0,
                      "build_scorer_count" : 54,
                      "create_weight" : 1122,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 3768443
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 50858,
            "collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time_in_nanos" : 2098312,
                "children" : [
                  {
                    "name" : "SimpleFieldCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 812015
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

The issue is (apart from the query being slow), that the profile API reports 9193115 nanos which is 9 millis + 2 millis for collection. Which are the other stages that can make it so much slower, given that only one shard is queried thanks to the routing?

abhinav_elk · October 2, 2020, 2:23pm

Hi I am stuck on a similar problem, please post here if you have some findings.
thanks.

Glamdring · October 3, 2020, 10:41am

I tried turning off adaptive replica selection https://www.elastic.co/blog/improving-response-latency-in-elasticsearch-with-adaptive-replica-selection and it seems to have helped a little, but might be a coincidence

system · October 31, 2020, 10:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Profile API/ Debugging Slow queries Elasticsearch	1	348	October 30, 2020
While query time and collector time bigger than request took time when profiling? Elasticsearch	5	657	February 20, 2019
Simple Query (1 cardinality, 19,000 docs) takes ~100k ms to complete Elasticsearch	4	314	July 21, 2021
Profile API `time_in_nanoseconds` value higher than `took` time Elasticsearch	5	748	October 5, 2018
Profile API time does not match top-level "took" Elasticsearch	2	998	July 5, 2017

Why numbers don't match when profiling an Elasticsearch slow request

Related topics