Bucket Script Aggregation on Range Aggregation

I am trying to do a bucket script aggregation on a range aggregation (Date Histogram being the parent) and is ending up with an exception. The query that I am trying is

{
"aggs": {
    "perhour": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "min_doc_count": 0
        },
      "aggs": {
        "apdsatrange": {
          "range": {
            "field": "timetaken",
            "ranges": [
              {
                "from": 0,
                "to": 500
              }              
            ],
            "keyed": true
          },
		  "aggs" : {
			"sat_count" : { "value_count" : { "field" : "timetaken" } }
			}
        },
		"apdtolrange": {
          "range": {
            "field": "timetaken",
            "ranges": [
              {
                "from": 500,
                "to": 2000
              }              
            ],
            "keyed": true
          },
		  "aggs" : {
			"tol_count" : { "value_count" : { "field" : "timetaken" } }
			}
        },
		"apdcalc": {
			"bucket_script": {
                "buckets_path": {
					"sat": "apdsatrange>sat_count",
					"tol": "apdtolrange>tol_count"
					},
				"script": "sat + tol/2"
			}
		}
      }
    }
  }
}

I have used two different range aggregation so as to refer the count for each in the bucket script. I end up with the following exception on execution

{
  "error" : {
    "root_cause" : [ ],
    "type" : "reduce_search_phase_exception",
    "reason" : "[reduce] ",
    "phase" : "fetch",
    "grouped" : true,
    "failed_shards" : [ ],
    "caused_by" : {
      "type" : "aggregation_execution_exception",
      "reason" : "buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]"
    }
  },
  "status" : 503
}

Without the bucket aggregation, I am getting the individual range values as expected. eg:

  "aggregations" : {
    "perhour" : {
      "buckets" : [ {
        "key_as_string" : "2016-01-04T00:00:00.000Z",
        "key" : 1451865600000,
        "doc_count" : 4413,
        "apdtolrange" : {
          "buckets" : {
            "500.0-2000.0" : {
              "from" : 500.0,
              "from_as_string" : "500.0",
              "to" : 2000.0,
              "to_as_string" : "2000.0",
              "doc_count" : 48,
              "tol_count" : {
                "value" : 48
              }
            }
          }
        },
        "apdsatrange" : {
          "buckets" : {
            "0.0-500.0" : {
              "from" : 0.0,
              "from_as_string" : "0.0",
              "to" : 500.0,
              "to_as_string" : "500.0",
              "doc_count" : 4357,
              "sat_count" : {
                "value" : 4357
              }
            }
          }
        }
      },....

Can anyone please help me in identifying what is wrong with the query or with a better way of doing this.?
ES version is 2.1

1 Like

The problem here is you are trying to get values in the bucket_script aggregation from within a multi-bucket aggregation (the range aggregation). Currently in pipeline aggregations you can only access values that are within a single-bucket aggregation, a metric aggregation, or another pipeline aggregation at the same level.

In you case, since you only have a single range in you range aggregation anyway, you could replace the range aggregations with filter aggregations instead which filter the same range. That way you can reference the values from within your bucket_script aggregation.

3 Likes

Thanks for the reply Colin. Is there any chance there would be multi bucket aggregation support in the future.

I had split up my range aggregation to have single range so as to access them via bucket script. Regarding using filter aggregation, wouldn't that still involve having a range query.? I tried the following but still end up with the same error.

"apdsatrange": {
          "filter": {
            "range": {
            "timetaken": {
                "from": 0,
                "to": 500
              }              
          }
	}
      }
1 Like

It is something we would like to do but we need to come up with a clean way of doing this.

The problem is, regardless of how many buckets the range aggregation output has, it is still a multi-bucket aggregation, whereas the filter aggregation is a single bucket aggregation (only ever outputs one bucket)

Yes you would need to use a range query in the filter aggregation but this is different from a range aggregation.

Could you post the full aggregations request you used with the filter aggregation (all aggregation in your request) so I can see how you are trying to use the filter aggregation?

1 Like

Thanks Colin. Here is what I have tried

{
"aggs": {
    "perhour": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "min_doc_count": 0
        },
      "aggs": {
        "apdsatrange": {
          "filter": {
            "range": {
            "timetaken": {
                "from": 0,
                "to": 500
              }              
          }
		 }
      },
	  "apdtolrange": {
          "filter": {
            "range": {
            "timetaken": {
                "from": 500,
                "to": 2000
              }              
            }
          }
	  },
	  "apdcalc": {
			"bucket_script": {
                "buckets_path": {
					"sat": "apdsatrange",
					"tol": "apdtolrange"
					},
				"script": "sat + tol/2"
			}
		}
    }
  }
}
}
2 Likes

That looks mostly correct but I think you just need to change your buckets_paths match the following snippet:

"apdcalc": {
    "bucket_script": {
        "buckets_path": {
            "sat": "apdsatrange>_count",
            "tol": "apdtolrange>_count"
        },
        "script": "sat + tol/2"
    }
}

This indicates you want the doc_count attribute from the filter aggregation.

Let me know if that works.

2 Likes

That seems to have worked. Thanks a lot for your help.

Cheers
S

1 Like