Pipeline aggregation in Composite Aggregation

I'm trying to get Pipeline Aggregation for Composite Aggregation similar to this post. I'm getting similar error in version 6.8. Looks like it was a bug in version 6.2. I'm wondering if this has been fixed in version 6.8 or later.

1 Like

Adding more details.

Here is the composite aggregate query I'm using with pipeline stats_bucket, which does not work.

{
  "aggregations": {
    "my_buckets": {
      "composite": {
        "size": 2,
        "sources": [
          {
            "city": {
              "terms": {
                "field": "city",
                "order": "asc"
              }
            }
          },
          {
            "subject": {
              "terms": {
                "field": "subject",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "rating_avg": {
          "sum": {
            "field": "rating"
          }
        }
      }
    }
  },
  "ratings_agg": {
    "stats_bucket": {
      "buckets_path": "my_buckets>rating_avg"
    }
  }
}

Following is the aggregate query similar to above but using nested aggregate query and the pipeline aggregation works as expected.

{
  "aggregations": {
    "my_buckets": {
      "aggregations": {
        "subject": {
          "terms": {
            "field": "subject"
          },
          "aggregations": {
            "city": {
              "terms": {
                "field": "city"
              }
            },
            "rating_avg": {
              "stats": {
                "field": "rating"
              }
            }
          }
        },
        "ratings_agg": {
          "stats_bucket": {
            "buckets_path": "subject>rating_avg.avg"
          }
        }
      }
    }
  }
}

As our docs state, pipeline aggregations are still not supported together with composite aggregations.

However there is an alternative to your usecase. If you upgrade to at least 7.5 you can use a transform to pivot the data in a similar way to your composite aggregation, our e-commerce example is similar to yours. The result will be written to an index and you can use a stats aggregation on this.

Thanks for the reply. Will Elasticsearch support pipeline aggregations in the composite aggregations any time soon in future? Does Elasticsearch team have this feature in the roadmap?

The composite aggregation has been introduced to allow to paginate through aggregation results, so its made for large amounts of data. Supporting a pipeline aggregation would mean to have a global view of that data. This is a fundamental problem, storing the data to disk like transform does is the only way to solve this.

For me the question is, whether we solve the use case, not whether composite aggs support pipelines. I think with transform we solved the problem, however I think it can be improved to make it easier to use.

Having that said, "large amounts of data" is a question of definition. We improved aggregations over time and especially in the 7.x series reduced memory consumption. This eventually allowed us to increase the default maximum number of buckets. Aggregation can now return 65k instead of 10k, starting from 7.9.

If you have less than 65k buckets as result of your aggregation, you don't need a composite aggregation in 7.9 and beyond.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.