Cardinality query takes more than 10 seconds to return for buckets of unique values ~15K

Due to the nature of the data I work with I have to create buckets of 5 minutes and then inside the bucket to calculate the count of the unique values (cardinality).

Up until recently in our system those unique values wouldn't be more than 2-3K per bucket, but due to more traffic, the cardinality of that value increased and can go up to 15K.

After looking at the cardinality query I saw that the default precision_threshold is 3000, but my query takes around 10 seconds to complete.
I tried using 100 as the precision and the query went down to 4 seconds which is much better, but not great in my use case :slight_smile: .

From trying several queries I understood that the response time is a function of how big the cardinality is on those buckets that I'm computing, which makes sense.

What I don't completely understand is, if from a shard the response is ~500ms up to 900ms (profiled the query), how come the response comes back in 10seconds. Shouldn't the queries get executed in parallel and then aggregated?

Also since this query takes the same time every time I execute it, I assume that the aggregations are not getting cached. I did check the request caches for those indices and they were increasing after the first query.

Thanks in advance

Is that for each and every shard in the query? Cause ultimately Elasticsearch is only as fast as the slowest response.

Yes that's correct.
I just had a flash... I have 10 nodes that each has 4 cores, Elasticsearch assigns 7 search threads to each node.
With this query I'm trying to reach 31 indices * 5 shards = 155 shards in total.

I'm saying this because in my older cluster (which btw was in a datacenter) I had 50 cores per node * 3 nodes = 150 search threads but now I have 70 in total (new setup in the cloud).

Could this be related?

Quite probably, yes.

1 Like

I might try to increase the search threads and benchmark my queries, although I know that it's not recommended to change those settings, unless you really know whay you are doing.

Something that is weird is that only the cardinality aggregation inside the buckets takes so long. For other types of queries the responses are much faster.

There is not much point in increasing the thread pool size unless you have threads to back it up with. The defaults are generally quite good. You may instead look to reduce the number of shards do that there is rewear yo process, resulting in less tasks being queued up.

1 Like

Makes sense. I will decrease the shards, and let you know.

Thanks :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.