I'm looking for a reliable and complete way to get Terms Aggregation doc_counts within a specific range. I'm currently using the min_doc_count to constrain on one end of the range however this is limiting for my use case.
I've already attempted to use nested bucket selector aggregation to constrain the max count of an aggregation. However, since it is nested, this only constrains the results of the original terms aggregation, so my results vary based on the value of the "size" param for my query.
Here's an example of my query against an index containing email addresses with activity "rows" that I am counting:
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"RANGE": {
"terms": {
"field": "email",
"size": 1000,
"min_doc_count": 20
},
"aggs": {
"sum-for-bucket-selector": {
"value_count": {
"field": "email"
}
},
"max-doc-count": {
"bucket_selector": {
"buckets_path": {
"count": "sum-for-bucket-selector"
},
"script": {
"inline": "params.count < 40"
}
}
}
}
}
}
}
Further, I would also like to be able to page the result set. I've attempted this with partitioning but that would require me to know the total count of my Terms Aggregation with nested bucket selector aggregation ahead of time otherwise I get sparsely populated partitions and not true paged results.
Here's an example of that query:
{
"from": 0,
"size": 0,
"aggs": {
"RANGE": {
"terms": {
"field": "email",
"include": {
"partition": 0,
"num_partitions": 10
},
"size": 500,
"min_doc_count": 1000
}
}
}
}
Any recommendations on other options for achieving this? Thanks in advance!