A question about internals, so hopefully I can get some response regarding some vague questions. Do not actually have access to the cluster to test out hypothesis.
First of all, according to the docs
"The search.max_buckets cluster setting limits the number of buckets allowed in a single response."
Is the number of buckets truly per response or per aggregation (and sub-aggregations)? If I have four terms aggregations, each set to a size equal to half of max_buckets (and the amount of data exists), would the exception occur?
What constistutes a bucket for a top hits aggregation? Each individual hit, just one bucket for all hits or none at all?
Here is a pseudo aggregation request:
{
"query": {
...
},
"aggs": {
"top_term_agg": {
"terms": {
"field": "somefield",
"size": 5000,
"min_doc_count": 2
},
"aggs": {
"th_0": {
"filter": {
...
},
"aggs": {
"top_hits": {
...
}
}
},
"th_1": {
...
},
"min_bucket_selector": {
"bucket_selector": {
...
}
}
}
}
}
}
The top-level aggregation contains a variable (per request, each term would have the same amount of sub-aggregations) number of filtered sub-aggregations. The response is something like
"aggregations": {
"top_term_agg": {
"meta": {
},
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "somekey",
"doc_count": 60,
"th_1": {
"doc_count": 30,
"top_hits": {
"hits": {
..
}
}
},
"th_0": {
"doc_count": 30,
"top_hits": {
"hits": {
...
}
}
}
}
}
}
}
Since each term in the top-level aggregation contains two filtered aggregations, would the total number of buckets be 5000*2? Does the top hits aggregation add to the total number of buckets? Does the max_buckets check occur before or after the bucket_selector?
I know that I have tools such as increasing max_buckets, lowering the shard_size or using terminate_after, but I am trying to determine what are the limits and perhaps advising user behavior first.