Significant terms aggregation returns incorrect bg_count value when querying index with nested objects. The value is the same as the document counts returned by _cat/indices
API (which returns Lucene-level doc count). I'm assuming that the result of aggregation is also incorrect as "significance" is calculated using the bg_count.
I'd like to know if this is expected behaviour or not.
This may be an expected behaviour but, if so, it is preferable to be documented. In fact, the document for a similar feature, significant text aggregation, says that the feature doesn't support nested objects. So I'm wondering that situation might be the same with significant terms aggregation.
Additionally, the behaviour changed at 8.4. Starting from 8.4, the aggregation returns the correct bg_count (the count we can confirm with the count
API).
In version 8.3.3, the bg_count value for Significant terms aggregation is incorrect.
GET /
{
"name": "elastic_integration",
"cluster_name": "elastic_integration_cci_recipe_search_svc",
"cluster_uuid": "5JpEuResTGKBbsIyaT5xJw",
"version": {
"number": "8.3.3",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "801fed82df74dbe537f89b71b098ccaff88d2c56",
"build_date": "2022-07-23T19:30:09.227964828Z",
"build_snapshot": false,
"lucene_version": "9.2.0",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
},
"tagline": "You Know, for Search"
}
PUT person
{
"mappings": {
"properties": {
"name": {"type": "keyword"},
"friends": {
"type": "nested",
"properties": {
"name": {"type": "keyword"}
}
}
}
}
}
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "person"
}
POST person/_doc
{
"name" : "John",
"friends" : [{"name" : "John"}, {"name" : "Alice"}, {"name" : "Tim"}]
}
{
"_index": "person",
"_id": "zdIyU4cBPq5OH_vFlUEb",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
POST /person/_search
{
"aggs": {
"significant_names": {
"significant_terms": {"field": "name", "min_doc_count": 1}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "person",
"_id": "zdIyU4cBPq5OH_vFlUEb",
"_score": 1,
"_source": {
"name": "John",
"friends": [
{
"name": "John"
},
{
"name": "Alice"
},
{
"name": "Tim"
}
]
}
}
]
},
"aggregations": {
"significant_names": {
"doc_count": 1,
"bg_count": 4,
"buckets": [
{
"key": "John",
"doc_count": 1,
"score": 3,
"bg_count": 1
}
]
}
}
}
I've also tested with 7.16 and got the same result.