Calculating aggs at each level of subaggregation

elkcurious · March 24, 2016, 6:47pm

Hi,

I have event data like this. And I need to calculate aggs at multiple levels of bucket aggregations. I cannot find a way to calculate aggs except at the "leaf level".

{ "id": 1, "field1" : "A1", "field2" : "B11", "field3" : 100, "field4" : 2.0}
{ "id": 2, "field1" : "A1", "field2" : "B21", "field3" : 95, "field4" : 1.8}
{ "id": 3, "field1" : "A1", "field2" : "B11", "field3" : 124, "field4" : 1.6}

{ "id": 4, "field1" : "A2", "field2" : "B12", "field3" : 104, "field4" : 2.1}
{ "id": 5, "field1" : "A2", "field2" : "B22", "field3" : 101, "field4" : 1.7}
{ "id": 6, "field1" : "A2", "field2" : "B22", "field3" : 103, "field4" : 1.4}

In my data set,

field3 and field4 can be any numeric value and the values vary with each event.
field1 and field2 are used as terms in query.
A1 has multiple Bx1 terms. Similarly, A2 has multiple Bx2 terms.
I'm interested in averaging fields3 and fields4 by buckets of Bxy terms. I would get 1 avg for each Bxy term (level2 avg).

e.g id 5, 6 would become one level2 avg (based on A2, B22 term combination). Similarly, id 1, 3 would become one level2 avg (based on A1, B11 term combination)

NOTE: I am able to use nested terms aggregations and calculate this level2 avg.

Then I need a rollup avg for A1 (level1), that is an avg of the avg calculated at level2.

I'm unable to figure out how to do this rollup avg. I tried using bucket_path, but get "Found two sub aggregation definitions" error as I have a terms and avg aggregation at level1.

If I were successful, I would calculate another rollup avg for all Ax (level0), that is an avg of all level1 avg computed.

How can I do this in ES?

Thanks.

The working level2 avg query is below. The attempt at using bucket_path is further below.

GET _search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "by_time": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "day"
      },
      "aggs": {
        "field1_aggs": {
          "terms": {
            "field": "field1.raw",
            "size": 0
          },
          "aggs": {
            "field2_aggs": {
              "terms": {
                "field": "field2.raw",
                "size": 0
              },
              "aggs": {
                "field3_avg": {
                  "avg": {
                    "field": "field3"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Attempt with bucket_path

GET _search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "by_time": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "day"
      },
      "aggs": {
        "field1_aggs": {
          "terms": {
            "field": "field1.raw",
            "size": 0
          },
          "aggs": {
            "field2_aggs": {
              "terms": {
                "field": "field2.raw",
                "size": 0
              },
              "aggs": {
                "field3_avg": {
                  "avg": {
                    "field": "field3"
                  }
                }
              }
            }
          },
          "aggs": {
            "field1_avg": {
              "avg": {
                "buckets_path": { "field2_aggs>field3_avg.avg" }
              }
            }
          }
        }
      }
    }
  }
}

elkcurious · April 1, 2016, 6:02pm

Hi,

I've still not succeeded to find a way - does ES even allow such multi-level computations in a single query?

Or, do I have to index the result from the first query that calculates level2 result and then issue a second query to calculate the level 1 result?

Thx

elkcurious · April 5, 2016, 1:12am

Replying to my own messages ...

It seems I need to use pipeline aggregations vs. "normal" aggregations. The official documentation called pipeline aggregations experimental, so I had not paid attention earlier, but they are very relevant for my use-case as described in this blog https://www.elastic.co/blog/out-of-this-world-aggregations

However, I still cannot get the syntax right; regardless of where I add the the avg pipeline aggregation to the query, I get errors with this

          "field1_avg": {
             "avg_bucket": {
                "buckets_path": "field2_aggs>field3_avg" 
            }

Or by adding ".value" to the buckets_path

"buckets_path": "field2_aggs>field3_avg.value"

Error:

buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]

elkcurious · April 6, 2016, 10:56pm

This is solved now. For benefit of anyone else that looks at this thread : The problem was that the pipeline aggregation needed to be added at the sibling level of the field2_aggs (and I was originally adding it one level up in the query).

I could then also add another pipeline agg one level up, at the field1_aggs level, that used the output from the first pipeline agg to compute the top-most level agg.

swelltt0576 · May 27, 2017, 6:01am

dear sir,
I have the same problem now and have some confusing with you answer.
can you give me any code?

thanks very much

Topic		Replies	Views
Calculating Average of Average Elasticsearch	3	1487	July 5, 2017
Pipeline aggs with more than one level depth Elasticsearch	1	216	September 22, 2021
Cross-aggregartion calculation Elasticsearch	2	447	January 9, 2018
Multiple Sum subaggregations at same level using Java API Elasticsearch	1	1116	February 6, 2017
Getting sum sub aggregation Elasticsearch	2	4814	August 15, 2017

Calculating aggs at each level of subaggregation

Related topics