I created a simple query which looks for specific 5 car ID's and aggregates upon them. It's extremely fast, takes about 300-500ms. But if I add a nested aggregation which looks only for a specific colour in those 5 ID's, it suddenly takes 10 seconds or longer. This doesn't make sense to me, anyone got an explanation? Sample query down below:
300-500ms for a simple unscored filter is an order of magnitude slower than I would expect. When you do the query without the aggs, how many hits does it report? If it's hundreds of millions across many shards, well, that could take some time to aggregate over.
I haven't used the Profile API much, but maybe it can shed some light on it.
I can't fathom how aggregating over 78 hits would take 10 seconds.
I'd be curious to see what the underlying OS's are doing while this query is laboring for 10 seconds. Specifically, is there I/O wait with low CPU?
Just guessing here as I don't know what version you are running, what the mappings look like, how many data nodes there are, how many shards are in the index, how big the largest shards are, how big your JVM heap is, what kinds of storage you are using, if you have changed the default configuration in any way, etc.
Yeah, me neither, extremely odd. Any other query so far, even nested aggregations, has been supremely fast, even when aggregating on all the data. Yet, on only 78 hits all hell breaks lose lol.
No I/O wait with low CPU, Running v6.3.2.
All fields are just keyword type, except typeId which is long.
The cluster is huge, not sure how many data nodes or shards there are, but I'm pretty confident they aren't the problem
Nope, it reduces the speed down to a few hundred ms.
Pretty confident there isn't much problems with the architecture.
Noticed that the slowdown is caused by 'typeId: 8' field because it's an integer.
ES somehow does two operations: compares a range of numbers >8 and <8 instead of just =8.
Guess I should convert the field to a keyword type.. or are there any other ideas without the need for reindexing?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.