When using a significant_terms aggregation nested inside another aggregation, e.g. terms, I get different bg_counts for the same significant term found across term buckets.
Say e.g. the outer terms agg is on a field with US state codes ("CA", "FL, "NY", etc.) and the nested significant_terms agg is on a field with the type of sport persons perform (e.g. "tennis", "golf", "skiing", etc.).
I see the following types of results:
"aggregations": {
"frequentTerms": {
"buckets": [
{
"key": "NY",
"doc_count": 2027,
"significantTerms": {
"doc_count": 2027,
"buckets": [
{
"key": "sailing",
"doc_count": 80,
"score": 0.029240945633836113,
"bg_count": 80
},
{
"key": "golf",
"doc_count": 77,
"score": 0.02907984745352633,
"bg_count": 77
}
]
}
}
,
{
"key": "CA",
"doc_count": 100,
"significantTerms": {
"doc_count": 100,
"buckets": [
{
"key": "golf",
"doc_count": 42,
"score": 0.02301730117174594,
"bg_count": 18
},
{
"key": "tennis",
"doc_count": 42,
"score": 0.012398130001513895,
"bg_count": 9
}
]
}
}
]
}
}
I would expect that the bg_count for "golf" would be identical for the two buckets (states). I have set the shard_size to a very high number, and both min_doc_count and shard_min_doc_count to 1, with no effect.
Any insights would be very appreciated.
Thanks, Petter.