@uboness how to improve the accuracy of terms aggregation

Hi All

we use the terms aggregation to get the top n authors, but the 

aggregation may not return the top n authors.

As the elasticsearch guide said, the aggregated results are not always 

accurate.

Indeed we can increase the shard size to get more accurate results, but 

if the buckets returned by each shard are big enough, there will be a a
bottleneck in master node reducing the final result.

Is there a other way to improve the accuracy of terms aggregation?

Is there a good way to decrease the press of master node when executing 

the reducing phase?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

For the record, the bottleneck would not be on the master node (the node
that manages the cluster state) but on the node that coordinates the
execution of the search request, which is the node that your client
contacts. So if you are doing costly terms aggregations with high shard
sizes, it would help to round-robin between several nodes.

If you are interested in the accuracy issues of the terms aggregation, I
would recommend reading
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-approximate-counts
and upgrading to elasticsearch 1.4 which now returns an error bound on the
counts, so that you know how bad the counts might be. The only way to
improve accuracy is to increase the shard size, but as you noted, this
raises issues too.

On Thu, Dec 18, 2014 at 8:27 AM, yang ming ymbloy@gmail.com wrote:

Hi All

we use the terms aggregation to get the top n authors, but the

aggregation may not return the top n authors.

As the elasticsearch guide said, the aggregated results are not always

accurate.

Indeed we can increase the shard size to get more accurate results,

but if the buckets returned by each shard are big enough, there will be a a
bottleneck in master node reducing the final result.

Is there a other way to improve the accuracy of terms aggregation?

Is there a good way to decrease the press of master node when

executing the reducing phase?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0b83a2d6-8bd0-41dc-9e58-3b797949ca53%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7YSEwmJfWV87V_C1tyhSa6XdHCs54RJEBdqoBuEEKnHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.