Optimizations for nested aggregations

nilsga · May 23, 2014, 8:50am

I'm doing some benchmarking on aggregations on a dataset of about
50.000.000 documents. It's quite a complex aggregation, nesting 5 levels
deep. At the top level is a terms aggregation, and on the nested levels
there is a combination of terms, stats and percentiles aggregations on
deeper levels. With number of shards set to 15 and 20GB of heap on a 3
cluster setup in EC2 (m3.2xlarge with IOPS provisioned EBS), the
aggregation takes about 90 seconds, with quite some memory and CPU (about
50%) consumption. Doing this experiment leads me to some questions:

Will setting routing according to the terms in the top-level aggregation
have any impact? I don't know how the aggregations work, but the assumption
is that this will make sure all data for the sub aggregations are on the
same node.
Does the aggregation operate on the entire document, or does it only load
the fields that are aggregated?
Moving from a local dev environment with one node to a three node cluster
didn't yield as much improvment as I expected. Am I hitting any "limits"
with regards to aggregations of this complexity?
Are there any other optimizations that could affect the performance?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc5eb156-d2f7-4a39-af77-83b9c0a42c7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · May 23, 2014, 9:25am

On Fri, May 23, 2014 at 10:50 AM, nilsga@gmail.com wrote:

I'm doing some benchmarking on aggregations on a dataset of about
50.000.000 documents. It's quite a complex aggregation, nesting 5 levels
deep. At the top level is a terms aggregation, and on the nested levels
there is a combination of terms, stats and percentiles aggregations on
deeper levels. With number of shards set to 15 and 20GB of heap on a 3
cluster setup in EC2 (m3.2xlarge with IOPS provisioned EBS), the
aggregation takes about 90 seconds, with quite some memory and CPU (about
50%) consumption. Doing this experiment leads me to some questions:

Will setting routing according to the terms in the top-level aggregation
have any impact? I don't know how the aggregations work, but the assumption
is that this will make sure all data for the sub aggregations are on the
same node.

What is sure is that it will help with accuracy since you won't suffer from
this limitation of the terms aggregation anymore:
https://github.com/elasticsearch/elasticsearch/issues/1305 (the issue is
for facets, but aggregations work in a similar way).

Regarding performance, it should help since each shard will have 15x fewer
terms to take care of. I said "should" because depending on the cardinality
and distribution of the field of your top terms aggregation, this might
cause your shards to be balanced worse, and the shard with the most
documents might be a bottleneck.

Does the aggregation operate on the entire document, or does it only load

the fields that are aggregated?

Only fields that are aggregated.

Moving from a local dev environment with one node to a three node
cluster didn't yield as much improvment as I expected. Am I hitting any
"limits" with regards to aggregations of this complexity?

Are there any other optimizations that could affect the performance?

Regarding these two last questions, I would recommend to try upgrading to
Elasticsearch 1.2.0 and see if it makes things better. We have put efforts
into improving the memory footprint of multi-level aggregations (which
might in turn improve performance) as well as the performance of terms
aggregations thanks to global ordinals. See
Elasticsearch Platform — Find real-time answers at scale | Elastic for more
information.

Since you mentionned percentiles, I'd like to mention that although useful
they tend to be costly to compute. You can try to play with the compression
(
Elasticsearch Platform — Find real-time answers at scale | Elastic)
of this aggregation to see if it helps performance: it will make the
aggregation faster and more memory-efficient at the cost of some accuracy
loss. Default value is 100, so you can try eg. 10 to see if it yields any
performance improvement.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45miVN9qMYNKXr%3DoP8JT5dDOX22B1nBKP_MUFu2gi%2BLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nilsga · May 27, 2014, 10:44am

Thank you for your reply.

Here are some observations from a couple of days testing:

Setting up routing manually reduced the aggregation time about 40%!
... however, manual routing caused data to distribute unevenly. I assume
we could have taken steps to improve the distribution, but we didn't
investigate any further
Upgrading from 1.1.0 to 1.2.0 didn't seem to improve speed nor memory
usage, although we didn't do any accurate measurements of RAM usage
Changing the compression value for percentiles did indeed have an effect
Increasing number of nodes from 3 to 5 didn't seem to improve performance

Since adding additional nodes didn't seem to improve the performance, there
seem to be a bottleneck somewhere. The result of the aggregation is very
large (as a JSON-result, it results in about a million lines of text), so
maybe data transfer or constructing the result can be a bottleneck?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f52e564d-fd7a-4355-953d-2943c3e910c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

otisg · May 30, 2014, 5:16am

Massive JSON responses could indeed be a problem. I think you can easily
see if CPU, Disk, or Network are the bottleneck using really any monitoring
tool. Even dstat --cpu --mem --disk --net will give you an idea.

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tuesday, May 27, 2014 6:44:45 AM UTC-4, nil...@gmail.com wrote:

Thank you for your reply.

Here are some observations from a couple of days testing:

Setting up routing manually reduced the aggregation time about 40%!

... however, manual routing caused data to distribute unevenly. I assume
we could have taken steps to improve the distribution, but we didn't
investigate any further

Upgrading from 1.1.0 to 1.2.0 didn't seem to improve speed nor memory
usage, although we didn't do any accurate measurements of RAM usage

Changing the compression value for percentiles did indeed have an effect

Increasing number of nodes from 3 to 5 didn't seem to improve performance

Since adding additional nodes didn't seem to improve the performance,
there seem to be a bottleneck somewhere. The result of the aggregation is
very large (as a JSON-result, it results in about a million lines of text),
so maybe data transfer or constructing the result can be a bottleneck?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ade6258f-18bb-46bd-9d69-502ea64d3db2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nilsga · June 2, 2014, 8:21am

We're using the Java API. I assume that it is using a binary representation
of some kind that is more compact than JSON. I just mentioned JSON to
illustrate the size of the response. I'll certainly try to monitor
disk/network activity.

Nils-H

On Friday, May 30, 2014 7:16:38 AM UTC+2, Otis Gospodnetic wrote:

Massive JSON responses could indeed be a problem. I think you can easily
see if CPU, Disk, or Network are the bottleneck using really any monitoring
tool. Even dstat --cpu --mem --disk --net will give you an idea.

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Tuesday, May 27, 2014 6:44:45 AM UTC-4, nil...@gmail.com wrote:

Thank you for your reply.

Here are some observations from a couple of days testing:

Setting up routing manually reduced the aggregation time about 40%!

... however, manual routing caused data to distribute unevenly. I
assume we could have taken steps to improve the distribution, but we didn't
investigate any further

Upgrading from 1.1.0 to 1.2.0 didn't seem to improve speed nor memory
usage, although we didn't do any accurate measurements of RAM usage

Changing the compression value for percentiles did indeed have an effect

Increasing number of nodes from 3 to 5 didn't seem to improve
performance

Since adding additional nodes didn't seem to improve the performance,
there seem to be a bottleneck somewhere. The result of the aggregation is
very large (as a JSON-result, it results in about a million lines of text),
so maybe data transfer or constructing the result can be a bottleneck?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c626684e-a3f3-4188-9bce-368853138315%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Extremely slow nested aggregations, need suggestions on modeling/shards Elasticsearch	1	503	July 5, 2017
Is ElasticSearch truly scalable for analytics? Elasticsearch	26	1286	July 6, 2017
Elasticsearch Aggregation time Elasticsearch	6	410	July 6, 2017
Impact of client nodes on aggregations/Optimization of heavy aggregations Elasticsearch	6	3151	July 5, 2017
Need help, multiple aggregations with filters extremely slow, where to look for optimizations? Elasticsearch	7	583	July 6, 2017

Optimizations for nested aggregations

Otis

Otis

Related topics