Bg_counts in nested significant_terms aggregation

When using a significant_terms aggregation nested inside another aggregation, e.g. terms, I get different bg_counts for the same significant term found across term buckets.
Say e.g. the outer terms agg is on a field with US state codes ("CA", "FL, "NY", etc.) and the nested significant_terms agg is on a field with the type of sport persons perform (e.g. "tennis", "golf", "skiing", etc.).
I see the following types of results:

"aggregations": {
"frequentTerms": {
"buckets": [
{
"key": "NY",
"doc_count": 2027,
"significantTerms": {
"doc_count": 2027,
"buckets": [
{
"key": "sailing",
"doc_count": 80,
"score": 0.029240945633836113,
"bg_count": 80
},
{
"key": "golf",
"doc_count": 77,
"score": 0.02907984745352633,
"bg_count": 77
}
]
}
}
,
{
"key": "CA",
"doc_count": 100,
"significantTerms": {
"doc_count": 100,
"buckets": [
{
"key": "golf",
"doc_count": 42,
"score": 0.02301730117174594,
"bg_count": 18
},
{
"key": "tennis",
"doc_count": 42,
"score": 0.012398130001513895,
"bg_count": 9
}
]
}
}
]
}
}

I would expect that the bg_count for "golf" would be identical for the two buckets (states). I have set the shard_size to a very high number, and both min_doc_count and shard_min_doc_count to 1, with no effect.

Any insights would be very appreciated.

Thanks, Petter.

I see the same probleme, and very strange too, the _superset_freq is greater than doc_count...

I expect the problem to be related to nested docs.
They physically exist as separate docs in Lucene (from where we get some of our term frequency stats) but are accounted for differently in elasticsearch in things like top-level aggs where we like to pretend they don't exist.
The Lucene APIs we rely on for fast access to frequencies are subject to inaccuracies due to things like deleted documents but it looks like nested docs are another source of potential inaccuracies