Trying to understand the query performance implications of index sorting when track_total_hits
is true, along with the following query parameters listed below.
As afaik when track_total_hits
is enabled, Elasticsearch will still have to visit all segments and derive the doc count from all matching docs regardless of size
field and so is unable to perform early termination.
Query parameters:
- aggregation performed on timestamp field (kibana timelion)
- sort by timestamp (sort order matches index sorting order)
- retrieval of
_source
field
In this case, have questions on the query performance during aggregation vs fetching of the actual docs:
- during aggregation to get the total docs matching, will there be a performance improvement when querying since doc values that match the query are stored sequentially? Or is there minimal improvement on the query performance in this step compared to without index sorting?
- during fetching of the top N documents, does index sorting also change how stored fields are stored on disk? If so, since documents that match the query are stored sequentially, would that translate into less disk seeks and better query performance?
Referencing:
Elasticsearch will detect that the top docs of each segment are already sorted in the index and will only compare the first N documents per segment. The rest of the documents matching the query are collected to count the total number of results and to build aggregations.
Thanks in advance for helping to clarify this.