Significant terms aggregation returns incorrect bg_count value when querying index with nested objects in version 8.3.3

Significant terms aggregation returns incorrect bg_count value when querying index with nested objects. The value is the same as the document counts returned by _cat/indices API (which returns Lucene-level doc count). I'm assuming that the result of aggregation is also incorrect as "significance" is calculated using the bg_count.

I'd like to know if this is expected behaviour or not.

This may be an expected behaviour but, if so, it is preferable to be documented. In fact, the document for a similar feature, significant text aggregation, says that the feature doesn't support nested objects. So I'm wondering that situation might be the same with significant terms aggregation.

Additionally, the behaviour changed at 8.4. Starting from 8.4, the aggregation returns the correct bg_count (the count we can confirm with the count API).

In version 8.3.3, the bg_count value for Significant terms aggregation is incorrect.

GET /
{
  "name": "elastic_integration",
  "cluster_name": "elastic_integration_cci_recipe_search_svc",
  "cluster_uuid": "5JpEuResTGKBbsIyaT5xJw",
  "version": {
    "number": "8.3.3",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "801fed82df74dbe537f89b71b098ccaff88d2c56",
    "build_date": "2022-07-23T19:30:09.227964828Z",
    "build_snapshot": false,
    "lucene_version": "9.2.0",
    "minimum_wire_compatibility_version": "7.17.0",
    "minimum_index_compatibility_version": "7.0.0"
  },
  "tagline": "You Know, for Search"
}

PUT person
{
  "mappings": {
    "properties": {
      "name": {"type": "keyword"},
      "friends": {
        "type": "nested",
        "properties": {
          "name": {"type": "keyword"}
        }
      }
    }
  }
}
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "person"
}

POST person/_doc
{
  "name" : "John",
  "friends" : [{"name" : "John"}, {"name" : "Alice"}, {"name" : "Tim"}]
}
{
  "_index": "person",
  "_id": "zdIyU4cBPq5OH_vFlUEb",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

POST /person/_search
{
  "aggs": {
    "significant_names": {
      "significant_terms": {"field": "name", "min_doc_count": 1}
      
    }
  }
}
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "person",
        "_id": "zdIyU4cBPq5OH_vFlUEb",
        "_score": 1,
        "_source": {
          "name": "John",
          "friends": [
            {
              "name": "John"
            },
            {
              "name": "Alice"
            },
            {
              "name": "Tim"
            }
          ]
        }
      }
    ]
  },
  "aggregations": {
    "significant_names": {
      "doc_count": 1,
      "bg_count": 4,
      "buckets": [
        {
          "key": "John",
          "doc_count": 1,
          "score": 3,
          "bg_count": 1
        }
      ]
    }
  }
}

I've also tested with 7.16 and got the same result.

Thanks for providing a detailed replica here.

I'd suggest raising this on GitHub directly so an engineer can check into it.

Thank you for the replay.

I've opened The result of significant terms aggregation has changed after upgrading from 8.3.3 to 8.4.0 and above · Issue #95067 · elastic/elasticsearch · GitHub.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.