Optimizing term aggregations for multivalued fields

Jared_Miller · July 6, 2016, 6:50pm

I have an ES 2.3.3 index of about 300M documents / 60 primary shards, on a pretty big cluster (8x r3.2xlarge). One of the fields is multivalued (array field) that usually has 5-20 values per document and a cardinality of about 25,000. It is not_analyzed. I'm not storing the source of the documents at all because it's not my main datastore and the primary purpose right now is analytics/aggregations.

I'm trying to optimize term aggregations on this field when there are no filters, so it has to hit all 300M documents ("summarize the whole dataset" type queries). However, these queries are still a bit slow (5-8 sec), especially if there are any sub-aggregations (10-30 sec).

I am using caching, but it takes 5 or so repetitions of the same query before it starts hitting cache. I've also turned on "collect_mode": "breadth_first" if term or date_histogram sub-aggregations are involved, which helps.

The only other thing I've found that really seems to work is increasing the shard count (and my AWS budget...) But are there any other tricks for speeding up these kinds of summary queries?

Any ideas welcome. Thanks.

Topic		Replies	Views
Unique Term Values via Aggregation - Performance Considerations Elasticsearch	4	1206	January 17, 2017
Aggregation Sum is very slow Elasticsearch	1	537	October 9, 2018
Aggregations taking way too long? Elasticsearch	7	313	May 24, 2022
Hints to improve performance for numerous aggregations with high cardinalities Elasticsearch	6	725	January 30, 2019
Aggregation on non-indexed fields Elasticsearch	2	1762	April 25, 2018

Optimizing term aggregations for multivalued fields

Related topics