The query has a bool query and then an aggregation query. The bool query almost select 2M out of the 5M documents and then we run a terms Aggregation on the documents (on a nested field).
The cardinality of nested field is around 70-75.
The response is taking 4-6 seconds . I want to get this down to 1-2 seconds. I've tried increasing the RAM/CPU of machines and tried adding more machines, which didn't really turned out to be useful.
Is this response time OK for a running aggregations on nested field according to elasticsearch benchmarks? How can we improve it
Here's the aggregation query we are making:
What is the size of your documents? How many indices and shards does the query target? Can you show the output of _cat/indices ? How come you are using a highcpu node type instead of the standard highio? Do you have concurrent indexing going on?
Average size of document seems to be 18KB (total primary size/total documents).
We are running 5 shards (with 1 primary and 2 replicated, so total 15 shards) on the index.
@Christian_Dahlqvist
not sure about the reason for using HighCPU node , will find out, do u think it can make a difference?
yes, we also have a service which is indexing documents in parallel. However, the aggregation latency I have mentioned is for the time where not much indexing is going on. At the time when indexing rate is high, the performance is even worse (look at the image below)
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .monitoring-es-6-2019.07.19 xBVm7xQ-RQeycENBX3A8dQ 1 1 59030 252 86.2mb 45.9mb
green open .monitoring-beats-6-2019.07.18 3UO99F09TGeYJjtYf8XqKw 1 1 31 0 725.4kb 362.7kb
green open .watcher-history-9-2019.07.18 5b0hfi3RQLKPgZXoyi8FJQ 1 1 8640 0 23.5mb 11.6mb
green open .kibana_7 PjxwMPIRR-OTBVkPrYf1VQ 1 1 11 0 99.2kb 49.6kb
green open .monitoring-kibana-6-2019.07.18 SbhRB6G3RO-Jktl2h741Fw 1 1 8643 0 5mb 2.5mb
green open .triggered_watches-6 uE1_dKQeQxSI9Q4scCs6Yg 1 1 1 0 1.7mb 7kb
**green open ****_porterstem_minimal_english djATmntkSGW1gcRxgVfUgg 5 2 470809156 119711028 276.7gb 90.6gb**
green open .security-6 iVW38mvcR16O10lJ77OizQ 1 2 7 0 130.5kb 43.5kb
green open .monitoring-es-6-2019.07.17 C32br4NfS0myL3pf6eZKFA 1 1 220162 464 301.7mb 151mb
green open .kibana-6 4GEkQk5EToOJtWLxERGBkg 1 1 2 0 7.1kb 3.5kb
green open .watches-6 vKrCnjzIQHWWaduY5LGN6A 1 1 6 0 127.9kb 72.5kb
green open apm-6.8.1-onboarding-2019.07.18 mv6xfNSrQZq1aBA9s1uZyQ 1 2 1 0 17.8kb 5.9kb
green open .monitoring-kibana-6-2019.07.19 EyG7NNkqRHGHEsz6qftVuw 1 1 2093 0 1.7mb 1.1mb
green open .monitoring-es-6-2019.07.18 8LzdtOwKRyqgz_NHRDPMhA 1 1 235498 630 319.4mb 159.2mb
green open ****_porterstem_minimal_english4 YFmumZ_aTDSVU5sgOP4ZkA 5 1 0 0 2.5kb 1.2kb
green open .monitoring-kibana-6-2019.07.17 Cr_OAvPsTWabGs_QTJr46g 1 1 8640 0 4.6mb 2.3mb
green open .watcher-history-9-2019.07.17 5aPKdb3bSaiTD2vOMBQJsg 1 1 8640 0 24.8mb 12.3mb
green open .monitoring-alerts-6 N6oNBc9PRzG0NEQehe71RA 1 1 17 4 91.9kb 46.3kb
green open .watcher-history-9-2019.07.19 F-nYS0SASZWxBden2_dxow 1 1 2094 0 9mb 6mb
green open .kibana_task_manager OQauSQZeQ0eOye6jQGtzLA 1 1 2 0 25.1kb 12.5kb
@dadoonet the mentioned latency (4-6 secs) is indicating the "took" parameter only in the ES response
Elasticsearch is often limited by disk performance and I believe ‘highio’ nodes are superior in that respect. I would usually use ‘highcpu’ nodes for ingest nodes or search use cases where the full data set fit in the OS cache.
@Christian_Dahlqvist I created a new cluster with same configurations (RAM and CPU) with same number of machines and loaded a snapshot of the same ES data as in our prod with highIO nodes instead of highCPU. The results were pretty same (latency rather increased a little).
In that case disk I/O may not be a factor. Your shards are however at a bit over 90GB quite large. I would recommend trying with a larger number of shards, e.g. 10, in order to see if that makes a difference.
I just noticed that you are running a nested aggregation. I rarely use this so am not sure about the impact that has on performance. If you are constantly updating documents, which seems to be the case given the stats, there are data structures that need to be recalculated after merges and I am wondering if this is affecting the results.
Have you considered instead using a flat, non-nested structure?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.