How does the kibana "search profiler" "Query Profile" timing work?

I use version 7.13.1

Query I profile

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "logstash"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "search_tags": "kibana"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "content.eng": "dead letter queue"
          }
        }
      ]
    }
  },
  "aggs": {
    "author": {
      "terms": {
        "field": "authors.last_name"
      }
    }
  }
}

image

How can the "Cumulative time" be 119.227ms, when all timing below not add up to this?

@alisongoryachev can we please get some help here?

Thanks,
Bhavya

Hi Stefan,

after having tried to answer your question during the training delivery, I did some more research and also revisited our documentation about the profile API [Profile API | Elasticsearch Guide [8.2] | Elastic]. (There is also dedicated documentation available for the Search Profiler UI [Profile queries and aggregations | Kibana Guide [8.2] | Elastic].

As explained before the response to a profiling request is usually quite complex even for simple queries. As mentioned under Limitations [Profile API | Elasticsearch Guide [8.2] | Elastic] the reported timings do not include all aspects of the request. And in the section about the Timing Breakdown [Profile API | Elasticsearch Guide [8.2] | Elastic] you can find the following statement: "The intention of the breakdown is to give you a feel for A) what machinery in Elasticsearch is actually eating time, and B) the magnitude of differences in times between the various components. Like the overall time, the breakdown is inclusive of all children times.". This is in line with the response I gave you during the class.

It will be quite hard, even for Lucene experts to completely understand every single number reported back in the response. And it's my guess that the aggregated numbers do not sum up, due to the distributed nature of execution, maybe double-counting timings that happen in parallel. Last but not least, even if you would be able to track down every single number in a particular response, that could again change from one minor version to the next minor version.

Elasticsearch may look simple as it's hiding a lot of its complexity. But under the hood, it's still a complex, distributed system, leveraging Lucene, which alone bears a lot of complexity. The search profile API is probably the single API of Elasticsearch that gets you closest to that complexity down to a very low local Lucene level. Therefore it is (and probably never will be) trivial to fully understand the individual numbers in the response and how those numbers sum up.