Elasticsearch terms aggregation taking 5 seconds on 5 million documents

I'm using elasticsearch hosted service (elasticsearch.co) with 3 machines.
44%20AM

The query has a bool query and then an aggregation query. The bool query almost select 2M out of the 5M documents and then we run a terms Aggregation on the documents (on a nested field).

The cardinality of nested field is around 70-75.

The response is taking 4-6 seconds . I want to get this down to 1-2 seconds. I've tried increasing the RAM/CPU of machines and tried adding more machines, which didn't really turned out to be useful.

Is this response time OK for a running aggregations on nested field according to elasticsearch benchmarks? How can we improve it
Here's the aggregation query we are making:

[
{
"criteria": {
"nested": {
"path": "criteria"
},
"aggregations": {
"criteria_aggs": {
"terms": {
"field": "criteria.id",
"size": 200,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg_score": {
"avg": {
"field": "criteria.present"
}
}
}
}
}
}
}
]

What is the size of your documents? How many indices and shards does the query target? Can you show the output of _cat/indices ? How come you are using a highcpu node type instead of the standard highio? Do you have concurrent indexing going on?

Also what is the value of the took field of the response object ?

Average size of document seems to be 18KB (total primary size/total documents).
We are running 5 shards (with 1 primary and 2 replicated, so total 15 shards) on the index.

@Christian_Dahlqvist
not sure about the reason for using HighCPU node , will find out, do u think it can make a difference?

yes, we also have a service which is indexing documents in parallel. However, the aggregation latency I have mentioned is for the time where not much indexing is going on. At the time when indexing rate is high, the performance is even worse (look at the image below)

Here's the output of cat/indices :

health status index                               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .monitoring-es-6-2019.07.19         xBVm7xQ-RQeycENBX3A8dQ   1   1      59030          252     86.2mb         45.9mb
green  open   .monitoring-beats-6-2019.07.18      3UO99F09TGeYJjtYf8XqKw   1   1         31            0    725.4kb        362.7kb
green  open   .watcher-history-9-2019.07.18       5b0hfi3RQLKPgZXoyi8FJQ   1   1       8640            0     23.5mb         11.6mb
green  open   .kibana_7                           PjxwMPIRR-OTBVkPrYf1VQ   1   1         11            0     99.2kb         49.6kb
green  open   .monitoring-kibana-6-2019.07.18     SbhRB6G3RO-Jktl2h741Fw   1   1       8643            0        5mb          2.5mb
green  open   .triggered_watches-6                uE1_dKQeQxSI9Q4scCs6Yg   1   1          1            0      1.7mb            7kb
**green  open   ****_porterstem_minimal_english  djATmntkSGW1gcRxgVfUgg   5   2  470809156    119711028    276.7gb         90.6gb**
green  open   .security-6                         iVW38mvcR16O10lJ77OizQ   1   2          7            0    130.5kb         43.5kb
green  open   .monitoring-es-6-2019.07.17         C32br4NfS0myL3pf6eZKFA   1   1     220162          464    301.7mb          151mb
green  open   .kibana-6                           4GEkQk5EToOJtWLxERGBkg   1   1          2            0      7.1kb          3.5kb
green  open   .watches-6                          vKrCnjzIQHWWaduY5LGN6A   1   1          6            0    127.9kb         72.5kb
green  open   apm-6.8.1-onboarding-2019.07.18     mv6xfNSrQZq1aBA9s1uZyQ   1   2          1            0     17.8kb          5.9kb
green  open   .monitoring-kibana-6-2019.07.19     EyG7NNkqRHGHEsz6qftVuw   1   1       2093            0      1.7mb          1.1mb
green  open   .monitoring-es-6-2019.07.18         8LzdtOwKRyqgz_NHRDPMhA   1   1     235498          630    319.4mb        159.2mb
green  open   ****_porterstem_minimal_english4 YFmumZ_aTDSVU5sgOP4ZkA   5   1          0            0      2.5kb          1.2kb
green  open   .monitoring-kibana-6-2019.07.17     Cr_OAvPsTWabGs_QTJr46g   1   1       8640            0      4.6mb          2.3mb
green  open   .watcher-history-9-2019.07.17       5aPKdb3bSaiTD2vOMBQJsg   1   1       8640            0     24.8mb         12.3mb
green  open   .monitoring-alerts-6                N6oNBc9PRzG0NEQehe71RA   1   1         17            4     91.9kb         46.3kb
green  open   .watcher-history-9-2019.07.19       F-nYS0SASZWxBden2_dxow   1   1       2094            0        9mb            6mb
green  open   .kibana_task_manager                OQauSQZeQ0eOye6jQGtzLA   1   1          2            0     25.1kb         12.5kb

@dadoonet the mentioned latency (4-6 secs) is indicating the "took" parameter only in the ES response

Elasticsearch is often limited by disk performance and I believe ‘highio’ nodes are superior in that respect. I would usually use ‘highcpu’ nodes for ingest nodes or search use cases where the full data set fit in the OS cache.

@Christian_Dahlqvist I created a new cluster with same configurations (RAM and CPU) with same number of machines and loaded a snapshot of the same ES data as in our prod with highIO nodes instead of highCPU. The results were pretty same (latency rather increased a little).

In that case disk I/O may not be a factor. Your shards are however at a bit over 90GB quite large. I would recommend trying with a larger number of shards, e.g. 10, in order to see if that makes a difference.

I just noticed that you are running a nested aggregation. I rarely use this so am not sure about the impact that has on performance. If you are constantly updating documents, which seems to be the case given the stats, there are data structures that need to be recalculated after merges and I am wondering if this is affecting the results.

Have you considered instead using a flat, non-nested structure?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.