Metric aggregation fails in composite aggregation


I'm using Elasticsearch to store test results of a software. I'm running Elasticsearch on Docker CE with the following configuration:

Docker version 17.12.0-ce, build c97c6d6
Ubuntu 14.04.5 LTS 14.04 trusty

The documents look like the following:


I want to compute the average Pass and Fail numbers for all combinations of and os.version for their newest software version, so I use the following Composite query:

GET _search
  "size": 0,
    "query": {
      "match_all": {}
    "aggs": {
      "products": {
        "composite": {
          "sources": [
              "osName": {
                "terms": {
                  "field": ""
              "osVer": {
                "terms": {
                  "field": "product.os.version.keyword"
        "aggs": {
          "revision": {
            "terms": {
              "field": "",
              "order": {
                "SWDate": "desc"
              "size": 1
            "aggs": {
              "SWDate": {
                "max": {
                  "field": ""
               "Pass": {
                "avg": {
                  "field": "results.summary.Pass",
                  "missing": 0

However, this fails with:

"_shards": {
    "total": 6,
    "successful": 1,
    "skipped": 0,
    "failed": 5,
    "failures": [
        "shard": 0,
        "index": "test_results",
        "node": "<nodeId>",
        "reason": {
          "type": "illegal_state_exception",
          "reason": "Cannot replay yet, collection is not finished: postCollect() has not been called"

If I remove the Pass average aggregation it does not fail but id does fail with other metric aggregations. Any ideas what I might do wrong? Thanks!

It seems that when running elasticsearch 6.1.1 it d seems to work intermittently whereas in 6.2.1 it is fully reproducible.

Created for this as I believe it is a bug.

