Nested aggregation slows query from ~400ms to 10s

newnewb · September 6, 2018, 9:41am

Hello everyone,

I created a simple query which looks for specific 5 car ID's and aggregates upon them. It's extremely fast, takes about 300-500ms. But if I add a nested aggregation which looks only for a specific colour in those 5 ID's, it suddenly takes 10 seconds or longer. This doesn't make sense to me, anyone got an explanation? Sample query down below:

{
    "stored_fields": [
        ...
    ],
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "typeId": 8
                            }
                        },
                        {
                            "terms": {
                                "carId": [
                                    2796963847,
                                    2550869,
                                    152794103,
                                    7678419,
                                    338
                                ]
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "cars": {
            "terms": {
                "field": "carId"
            },
            "size": 25,
            "aggs": {
                "colors": {
                    "terms": {
                        "field": "colourId",
                        "include": [
                            534278
                        ]
                    }
                }
            }
        }
    },
    "size": 0
}

loren · September 6, 2018, 1:49pm

300-500ms for a simple unscored filter is an order of magnitude slower than I would expect. When you do the query without the aggs, how many hits does it report? If it's hundreds of millions across many shards, well, that could take some time to aggregate over.

I haven't used the Profile API much, but maybe it can shed some light on it.

newnewb · September 6, 2018, 2:10pm

Mostly varies, can be lower at times. Without the aggs, it has 78 hits, but the database is huge.

Used Search Profiler in Kibana already, it looks something like this:

PS: The response is pre-cached already, so it's down to 550ms.
carId=keywordId

loren · September 6, 2018, 4:10pm

I can't fathom how aggregating over 78 hits would take 10 seconds.

I'd be curious to see what the underlying OS's are doing while this query is laboring for 10 seconds. Specifically, is there I/O wait with low CPU?

Just guessing here as I don't know what version you are running, what the mappings look like, how many data nodes there are, how many shards are in the index, how big the largest shards are, how big your JVM heap is, what kinds of storage you are using, if you have changed the default configuration in any way, etc.

Mark_Harwood · September 7, 2018, 1:54am

Try the terms agg with an ‘execution_hint’ of ‘map’

newnewb · September 7, 2018, 7:49am

Yeah, me neither, extremely odd. Any other query so far, even nested aggregations, has been supremely fast, even when aggregating on all the data. Yet, on only 78 hits all hell breaks lose lol.

No I/O wait with low CPU, Running v6.3.2.
All fields are just keyword type, except typeId which is long.
The cluster is huge, not sure how many data nodes or shards there are, but I'm pretty confident they aren't the problem

newnewb · September 7, 2018, 7:51am

Thanks for the hint!

Just tried combinations on both term aggregations, unfortunately speed stays the same (or even slows down in some cases a little bit).

Mark_Harwood · September 7, 2018, 11:47am

Crazy docs with millions of colour terms?

loren · September 8, 2018, 1:43pm

If you run the 10s agg query several times in a row, does it always take 10s?

Also, how confident are you that none of your nodes are swapping?

newnewb · September 10, 2018, 2:12pm

One colour per doc, but milions of docs

newnewb · September 10, 2018, 2:18pm

Nope, it reduces the speed down to a few hundred ms.
Pretty confident there isn't much problems with the architecture.

Noticed that the slowdown is caused by 'typeId: 8' field because it's an integer.
ES somehow does two operations: compares a range of numbers >8 and <8 instead of just =8.
Guess I should convert the field to a keyword type.. or are there any other ideas without the need for reindexing?

system · October 8, 2018, 2:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Troubleshooting Slow Aggregation Query Elasticsearch	7	3480	July 5, 2017
Nested aggregation of 7 documents takes 1 min Elasticsearch	22	818	March 3, 2020
Nested aggregation got too slow result Elasticsearch	12	562	May 22, 2018
Aggregation on nested document always takes 2-3 seconds? Elasticsearch	2	527	July 6, 2017
Very slow aggregate query Elasticsearch	16	1643	November 12, 2021

Nested aggregation slows query from ~400ms to 10s

Related topics