Slow top_hits aggregation

skyflyer · November 15, 2016, 10:04am

Hi!

We're using ElasticSearch 2.4 and I've discovered that I have a bottleneck in my query. Through exclusion, I've identified that it is a "top_hits" sub aggregation that I'm performing on a "terms" aggregation and the query runs for up to 15 seconds on a capable server (48 CPU, 32 GB of mem). There are less than 3M documents in the relevant indices and the mapping is doubly-nested (if that's even a term :).

The document structure (simplified) is:
{
"@timestamp": 123
"services": [ // nested
{
"id": 1,
"pids": [ // nested
{ "id": 1, "name": "foobar", "value": 7 },
{ "id": 2, "name": "foobar2", "value": 12 }
]
}
]

The query in question searches for a specific service and preforms aggregations based on "value" fields. Since pids are dynamic, I need to know the id, name and some other attributes of the pid in an aggregation. The complete aggregation is structured like:

query: basic filters on the top level
aggs:
 - date histogram (90 days, interval around 120 minutes)
   - nested services
      - nested pids
         - terms aggregation on pids (scripted, because I combine pid + name to get unique combos)
            - average agg
            - top_hits aggregation so I get the PID data (id, name, etc)

The top_hits aggregation is the one which seems to be slowest, although I don't really know how to exactly measure different parts of the query (if there is an API for that, I've failed to find it). I tested by removing the top_hits agg from the parent aggregation and it works faster that way.

Without top_hits at the bottom: 404ms
With it: 2680ms

I know, these measurements are not scientific, but they give a rough perspective.

For the top_hits aggregation, I really don't care whether it is a top hit -- I just need one hit, in order to get the details of the nested item.

I'll appreciate any insights as to how I might restructure my query/aggregations.

Thanks!

system · December 13, 2016, 10:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Top_hits aggregation performance problem Elasticsearch	2	541	March 16, 2021
Nested aggregation slows query from ~400ms to 10s Elasticsearch	11	1472	October 8, 2018
Top hits query working very slow while retrieving data Elasticsearch	1	505	June 29, 2020
Troubleshooting Slow Aggregation Query Elasticsearch	7	3493	July 5, 2017
Bucket query results \| top hits performance Elasticsearch	8	3778	July 6, 2017

Slow top_hits aggregation

Related topics