We have a massive, multi level aggregation that consumes a lot of memory and CPU in our ES cluster. The cluster consists of 3 master nodes and 4 data nodes. The heap size of each node is 8G, with 40% allocated to the field data cache. We have monthly indices with 6 shards each, and each index contains about 100.000.000 documents. The aggregation, which is 5 levels deep and consists of a mixture of terms, date_histogram, percentile and stats. As mentioned, running the aggregation consumes a lot of memory and CPU and impacts other index/search operations while it is running, which usually takes about a minute. With the current number of terms, one request consumes about 30-40% of the total heap.
- Will adding "client" nodes (master: false, data: false), and routing aggregation requests through these nodes off load some of the processing of the data nodes, so indexing and other search requests can perform as normal on the data nodes?
Also, the number of documents and top level terms will continue to grow, so some time in the future, the number of top-level buckets will be a lot higher than today, which leads to the next questions:
- Will ElasticSearch be able to handle an aggregation this complex at all, when the number of documents/terms increases?
- What can we do to optimize the memory usage of multi level aggregations?
Nils-Helge Garli Hegvik