I have been experimenting out performance of different Elasticsearch queries. I intend to fetch all the unique values for a particular field. As the number of unique values increase, the time for performing Terms aggregation increase.
So I decided to split out the results into a number of partitions based on the results from cardinality query. This improved the performance of ES queries. I feel that the query can be further optimised if I somehow disable the order in which buckets are returned. Currently, I see that even within the partition the aggregations are sorted on the basis of doc_count (default behaviour). For my use case I don't require any of the sorting, is there a way to disable sorting for Terms Aggregation?
This is a new type of agg that basically "scrolls" over the aggregation results in a memory-friendly manner, allowing you to paginate through the returned buckets and retrieve all the results without killing your cluster. The tradeoff is that there is no sorting available... which is fine for your use-case
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.