Hi there,
We are testing Elasticsearch to compute large aggregations.
Our queries consists of aggregation that return large number of buckets that we then use in other pipeline aggregations to get the results we need.
Our slowest queries runs in ~10sec on my laptop Can this be optimized on a production environment?
If so what configuration parameters should I tweak, and what would be a
an appropriate server configuration for such a usage.
Here is a simplified example of one of our queries
{
"size": 0, // skip query result
"query": { }, // some query skipped for this example
"aggs": {
"my_large_agg": { // create a large aggregations returning ~100k buckets, we are not really interesseted in the raw results but are using the results below in a sum_bucket aggregation
"terms": {
"field": "some_id",
"size": 0
},
"aggs": {
"my_large_agg_avg": {
"avg": {
"field": "some_field"
}
}
}
},
"my_large_agg_sum": { // bucket aggregation returning the result we want to return to the end user or use in other piped aggregation
"sum_bucket": {
"buckets_path": "my_large_agg.my_large_agg_avg"
}
}
}
}
}