How to calculate search time spent by Elasticsearch on its inverted index alone?

kmdabdulla · May 16, 2020, 11:15pm

Took field gives overall search time but is there any way to find the time elasticsearch spends only on its index alone?

Christian_Dahlqvist · May 17, 2020, 9:26am

Have you looked at the explain API?

polyfractal · May 18, 2020, 5:16pm

In addition to Explain, the Profile API can be helpful too.

It's a little more complicated than just the time taken by the inverted index, since in reality there are many steps to interacting with the index (advancing iterators to documents, scoring, two-phase iteration for things like phrases, etc). But the query profiler should give you some more insight.

Do note that the profiler adds significant overhead to query execution, so times should only be looked at in a relative manner. And concurrent execution across several shards can make reading the results tricky. E.g. overall wall-clock took time might be 15s, but looking at the profile results you see 10 shards that each took 10s. This indicates that some shards were executing concurrently (otherwise the took time would be 100s), and some sequentially (otherwise the took time would be 10s)

kmdabdulla · May 28, 2020, 8:33pm

Sorry for getting back late. I will definitely try this. Thanks for your time.

kmdabdulla · May 28, 2020, 8:35pm

I am sorry for getting back late. Thanks for your comment. Will try it !

kmdabdulla · June 1, 2020, 8:29pm

I have indexed 2 million documents and I am trying to return all the matching document ids at once. and I use PHP client.

My mapping is as follows:

$params = [
    'index' => $index,
    'body' => [
        'settings' => [
            "number_of_shards" => 1,
            "number_of_replicas" => 0,
            "index.queries.cache.enabled" => false,
            "index.soft_deletes.enabled" => false,
            "index.refresh_interval" => -1,
            "index.requests.cache.enable" => false,
            "index.max_result_window"=> $result_window
        ],
        'mappings' => [
            '_source' => [
                "enabled" => false
             ],
             'properties' => [
                "text" => [
                        "type" => "text",
                        "index_options" => "docs"
                ]
        ]
     ]
    ]
];

My query string is as follows:

$json = '{
"from" : 0, "size" : '.$size.',
        "profile": true,
"query": {
    "bool": {
      "filter" : {
        "match" : {
            "text" : {
            "query" : "justin trump clinton harry",
            "operator" : "and"
            }
        }
    }
}
}
}';

The goal is to get all the matching documents at once. I need only document ids (check whether the given term exists in a document or not only) so I used index_options as docs. I understand about scroll API but I want to use max_result_window. I am using only one shard, no replicas and I also avoided scoring of documents when I perform search operation.

My questions are as follows:

I want to retrieve only document ids and avoid document fetch phase, so I disabled the source field. To avoid other metadata, I tried the following as per this link avoid fetch phase. But I can still see document type and index name. Is there anything I need to do to get only document ids and avoid the fetch phase?

   "stored_fields": "_none_",
    "docvalue_fields": ["_id"]

Since I am retrieving all the matching documents scoring is irrelevant to me so I used filter clause but I was wondering why I am getting boostquery timing in profile API results below?. But you can also note that Booleanquery score timing is zero!
In order to know how much time Boolean query search took on Lucene index alone, should I just take the time reported by the Boolean query or do I need to add up all its children (term query) timings? Because when I add all those term query timings it is higher than the one reported by Boolean query. Any possible reason for this?
Do I need to include collector as well for my Boolean query timing, Because in profile api , it is said that "Lucene works by defining a "Collector" which is responsible for coordinating the traversal, scoring, and collection of matching documents. ". It also says that " It should be noted that Collector times are independent from the Query times. They are calculated, combined, and normalized independently! Due to the nature of Lucene’s execution, it is impossible to "merge" the times from the Collectors into the Query section, so they are displayed in separate portions". As for my understanding, it helps in traversing the postings list of Lucene index to execute Boolean query operation. Am I right in this regard?
Is there any similar API for investigating the indexing time in elasticsearch?. I was able to get indexing time in settings API but I am looking somethings similar to profile API.?

I apologize for having too much loaded questions. I will highly appreciate your help in this regard. I hope I have provided necessary information. Please let me know if you need anything.
Please find the profile API output in the following reply!
Thank You!

kmdabdulla · June 1, 2020, 8:33pm

system · June 29, 2020, 8:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Analyzing High Elastic Search Response time Elasticsearch	3	1182	September 19, 2017
How to calculate the time taken to process query dsl alone before performing actual searching operation? Elasticsearch	1	343	June 9, 2020
Search time consistency Elasticsearch	3	307	July 6, 2017
'took' field on the ES response Elasticsearch	3	6754	May 7, 2018
A question about the Profile API Elasticsearch	1	199	August 11, 2022

How to calculate search time spent by Elasticsearch on its inverted index alone?

Related topics