I'm doing some benchmarking on aggregations on a dataset of about
50.000.000 documents. It's quite a complex aggregation, nesting 5 levels
deep. At the top level is a terms aggregation, and on the nested levels
there is a combination of terms, stats and percentiles aggregations on
deeper levels. With number of shards set to 15 and 20GB of heap on a 3
cluster setup in EC2 (m3.2xlarge with IOPS provisioned EBS), the
aggregation takes about 90 seconds, with quite some memory and CPU (about
50%) consumption. Doing this experiment leads me to some questions:
- Will setting routing according to the terms in the top-level aggregation
have any impact? I don't know how the aggregations work, but the assumption
is that this will make sure all data for the sub aggregations are on the
same node. - Does the aggregation operate on the entire document, or does it only load
the fields that are aggregated? - Moving from a local dev environment with one node to a three node cluster
didn't yield as much improvment as I expected. Am I hitting any "limits"
with regards to aggregations of this complexity? - Are there any other optimizations that could affect the performance?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc5eb156-d2f7-4a39-af77-83b9c0a42c7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.