Pipeline Aggregations with buckets_path more than two

Ruslan_Didyk · December 19, 2016, 1:44pm

Hi all, I'm getting an error when I'm trying to use Pipeline Aggregation with bucket_path more than one (I.e.: "buckets_path" : "aggOne>aggTwo>aggThree"). For example I have query with 'terms' for two fields and some aggregations for those groups, and I want to get aggregation result from those aggregations.
Here is my simple mappings: (I use Elasticsearch 5.1.1)

Author (name=text, gender=text, age=integer)
Book (title=text, pages=integer, _parent=book)

Query below with 'terms' for only one field and aggregation works fine:

GET author/author/_search

{
    "size": 0,
    "aggs" : {
        "gender" : {
            "terms" : {
                "field" : "gender.keyword"
            },
            "aggs" : {
                "avg" : {
                    "avg" : {
                        "field" : "age"
                    }
                }
            }
        },
        "max" : {
            "max_bucket" : {
                "buckets_path" : "gender>avg"
            }
        }
    }
}

But for query with 'terms' for two fields and aggregation shows an error:

GET author/author/_search

{
   "size":0,
   "aggs":{
      "gender":{
         "terms":{
            "field":"gender.keyword"
         },
         "aggs":{
            "age":{
               "terms":{
                  "field":"age"
               },
               "aggs":{
                  "avg":{
                     "avg":{
                        "field":"age"
                     }
                  }
               }
            }
         }
      },
      "max":{
         "max_bucket":{
            "buckets_path":"gender>age>avg"
         }
      }
   }
}

Error message:

{
   "error": {
      "root_cause": [],
      "type": "reduce_search_phase_exception",
      "reason": "[reduce] ",
      "phase": "fetch",
      "grouped": true,
      "failed_shards": [],
      "caused_by": {
         "type": "aggregation_execution_exception",
         "reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]"
      }
   },
   "status": 503
}

Do you have any ideas how to solve this issue? Or Pipeline Aggregations do not support this case? Thank's a lot for any suggestions.

P.S: Actually I've found some workaround how to solve this, but it doesn't look very nice.
Here it is:

{  
   "size":0,
   "aggs":{  
      "gender":{  
         "terms":{  
            "field":"gender.keyword"
         },
         "aggs":{  
            "age":{  
               "terms":{  
                  "field":"age"
               },
               "aggs":{  
                 "avg":{  
                     "avg":{  
                        "field":"age"
                     }
                  }
               }
            },
            "max_avg":{  
               "max_bucket":{  
                  "buckets_path":"age>avg"
               }
            }
         }
      },
      "max":{  
         "max_bucket":{  
            "buckets_path":"gender>max_avg"
         }
      }
   }
}

I've added one more Pipeline Aggregation to aggregate max average for each 'age' groups, and then my main Pipeline Aggregation aggregates max from aggregated values.
Response look's like this:

"aggregations": {
      "gender": {
         "buckets": [
            {
               "key": "male",
               "doc_count": 4,
               "age": {
                  "buckets": [
                     {
                        "key": 80,
                        "doc_count": 2,
                        "avg": {
                           "value": 80
                        }
                     },
                     {
                        "key": 22,
                        "doc_count": 1,
                        "avg": {
                           "value": 22
                        }
                     },
                     {
                        "key": 40,
                        "doc_count": 1,
                        "avg": {
                           "value": 40
                        }
                     }
                  ]
               },
               "max": {
                  "value": 80,
                  "keys": [
                     "80"
                  ]
               }
            },
            {
               "key": "female",
               "doc_count": 2,
               "age": {
                  "buckets": [
                     {
                        "key": 40,
                        "doc_count": 2,
                        "avg": {
                           "value": 40
                        }
                     }
                  ]
               },
               "max": {
                  "value": 40,
                  "keys": [
                     "40"
                  ]
               }
            }
         ]
      },
      "max": {
         "value": 80,
         "keys": [
            "male"
         ]
      }
   }

Mark_Harwood · December 19, 2016, 1:51pm

Is this just a theoretical exercise? I can't see what business problem you're trying to solve here - the same choice of field being used in terms agg and avg aggs doesn't make sense.

Ruslan_Didyk · December 19, 2016, 2:34pm

Mappings and queries that I provided are just an example. Sorry, maybe this wasn't the best case:)
Buts it's not theoretical exercise.

What I'm trying to do is to implement service with OData (http://www.odata.org/) specification, and it supports such kind of queries.
So there could be case when someone wants to group by multiple fields, aggregate some other fields values and then get final aggregated value from those groups.

Query example from OData:

GET ~/Sales?$apply=groupby((Time),aggregate(Amount with sum as Total))/aggregate(Total with average as DailyAverage)

So I wonder, whether Pipeline Aggregation work's fine or this case is discussable?

Mark_Harwood · December 19, 2016, 3:14pm

So if I'm reading the example right it is trying to do some kind of standard deviation where each day's total is expressed as a multiple of the average sales for all days.

To do this you'd need something like this:

DELETE test
POST test/sales
{
	"amount":10,
	"date":"2012-09-21"
}
POST test/sales
{
	"amount":20,
	"date":"2012-09-21"
}
POST test/sales
{
	"amount":5,
	"date":"2012-09-22"
}
POST test/sales
{
	"amount":15,
	"date":"2012-09-22"
}

GET test/sales/_search
{
   "size": 0,
   "aggs": {
	  "by_date": {
		 "date_histogram": {
			"field": "date",
			"interval": "day"
		 },
		 "aggs": {
			"Total": {
			   "sum": {
				  "field": "amount"
			   }
			},
			"normalized_sales": {
			   "bucket_script": {
				  "buckets_path": {
					 "dailySalesTotal": "Total",
					 //!!!! THIS DOES NOT WORK.... !!!!!
					 "allDaysAverage": "../global_stats>avg"
				  },
				  "script": "params.dailySalesTotal / params.allDaysAverage"
			   }
			}
		 }
	  },
	  "global_stats": {
		 "stats_bucket": {
			"buckets_path": "by_date>Total"
		 }
	  }
   }
}

But notice this won't work because inside the array of by_date results we can't refer to a parent-level stat (attempted here using the illegal .. syntax). The docs state:

Paths are relative from the position of the pipeline aggregation; they are not absolute paths, and the path cannot go back "up" the aggregation tree

system · January 16, 2017, 3:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does Pipeline aggregation have path length limitation? Elasticsearch	1	497	December 31, 2017
Aggregate the total sum of aggregation field (bucket_path) Elasticsearch	12	5831	July 5, 2017
Pipeline aggs with more than one level depth Elasticsearch	1	216	September 22, 2021
Elasticsearch 6.2 - Composite Bucket Aggregation with Sum Bucket Aggregation and Pipeline Aggregations Elasticsearch	3	3123	August 30, 2018
Buckets_path to access field in top_hits aggregation result Elasticsearch	2	2247	July 13, 2017

Pipeline Aggregations with buckets_path more than two

Related topics