@uboness how to improve the accuracy of terms aggregation

yang_ming · December 18, 2014, 7:27am

Hi All

we use the terms aggregation to get the top n authors, but the

aggregation may not return the top n authors.

As the elasticsearch guide said, the aggregated results are not always

accurate.

Indeed we can increase the shard size to get more accurate results, but

if the buckets returned by each shard are big enough, there will be a a
bottleneck in master node reducing the final result.

Is there a other way to improve the accuracy of terms aggregation?

Is there a good way to decrease the press of master node when executing

the reducing phase?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · December 18, 2014, 2:37pm

For the record, the bottleneck would not be on the master node (the node
that manages the cluster state) but on the node that coordinates the
execution of the search request, which is the node that your client
contacts. So if you are doing costly terms aggregations with high shard
sizes, it would help to round-robin between several nodes.

If you are interested in the accuracy issues of the terms aggregation, I
would recommend reading

and upgrading to elasticsearch 1.4 which now returns an error bound on the
counts, so that you know how bad the counts might be. The only way to
improve accuracy is to increase the shard size, but as you noted, this
raises issues too.

On Thu, Dec 18, 2014 at 8:27 AM, yang ming ymbloy@gmail.com wrote:

Hi All
we use the terms aggregation to get the top n authors, but the
aggregation may not return the top n authors.
As the elasticsearch guide said, the aggregated results are not always
accurate.
Indeed we can increase the shard size to get more accurate results,
but if the buckets returned by each shard are big enough, there will be a a
bottleneck in master node reducing the final result.
Is there a other way to improve the accuracy of terms aggregation?

Is there a good way to decrease the press of master node when
executing the reducing phase?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7YSEwmJfWV87V_C1tyhSa6XdHCs54RJEBdqoBuEEKnHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
How to determine the correct size for terms aggregation, which will produce accurate aggregation results? Elasticsearch	3	425	January 19, 2022
Aggregation query Elasticsearch	1	357	August 16, 2014
Inconsistency count observed for term aggregation operation Elasticsearch	7	75	December 10, 2025
Is ElasticSearch truly scalable for analytics? Elasticsearch	25	1358	January 15, 2015
How to extend terms aggregation in elasticsearch Elasticsearch	0	364	October 17, 2014

@uboness how to improve the accuracy of terms aggregation

Related topics