Composite aggregation returns old version documents

Hello. I found that in some cases composite aggregation returns old documents (the initial indexed version instead of updated). Steps to reproduce:

curl -X PUT -H 'Content-Type: application/json' localhost:9200/test -d '{
      "settings": {
        "index": {
          "sort.field": "id"
        }
      },
      "mappings": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          }
        }
      }
    }'
curl -X PUT -H 'Content-Type: application/json' localhost:9200/test/_doc/1 -d '{
  "id": 1,
  "value": "Old Value"
}'
curl -X POST -H 'Content-Type: application/json' localhost:9200/test/_update/1 -d '{
  "doc": {
    "value": "New Value"
  }
}'
curl -X GET -H 'Content-Type: application/json' localhost:9200/test/_search?pretty -d '{
  "aggs": {
    "composite": {
      "composite": {
        "size": 1,
        "sources": 
          { "id": { "terms": { "field": "id" } } }
        ,"after": {"id": "0"}
      },
      "aggs": {
        "top_hits": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}'

Response:

{
  "took" : 102,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 1,
          "value" : "New Value"
        }
      }
    ]
  },
  "aggregations" : {
    "composite" : {
      "after_key" : {
        "id" : "1"
      },
      "buckets" : [
        {
          "key" : {
            "id" : "1"
          },
          "doc_count" : 2,
          "top_hits" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "test",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : 1.0,
                  "_source" : {
                    "id" : 1,
                    "value" : "Old Value"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

As you can see the top level hits contain the actual values but the composite aggregation buckets contain old values.
This problem disappears when I remove the index sort setting from the mapping (which I think is important for performance if I want to use pagination) or the "after" property from the query.
Is there something I do wrong or is it a bug in Elasticcearch?

Elasticsearch version: 7.9.2

What if you issue an explicit refresh after the update?

1 Like

Explicit refresh doesn't help

But looks like it works after some time, about several minutes (without explicit refresh). Is this the correct behavior or is there something wrong?

How are you doing the refresh? Are you waiting until it has completed?

 curl -X POST -H 'Content-Type: application/json' localhost:9200/test/_update/1?refresh=wait_for -d ...

or

curl -X POST localhost:9200/test/_refresh

Both don't help and both wait for completion if I understand it right. But as I mentioned before after several minutes I can see the updated value even without these explicit refreshes.
My Elasticsearch have default settings (just downloaded and installed the latest version).

Thanks for reporting @And390, this is indeed a bug. I opened https://github.com/elastic/elasticsearch/pull/63864 for the fix.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.