Script with input from one particular bucket, or from a higher level aggregation

rad · March 3, 2020, 12:17pm

I want to compare the daily average of a metric (the frequency of words appearing in texts) to the value of a specific day. This is during a week. My goal is to check whether there's a spike. If the last day is way higher than the daily average, I'd trigger an alarm.

So from my input in Elasticsearch I compute the daily average during the week and find out the value for the last day of that week.

For getting the daily average for the week, I simply cut a week's worth of data using a range query on date field, so all my available data is the given week. I compute the sum and divide by 7 for a daily average.

For getting the last day's value, I did a terms aggregation on the date field with descending order and size 1

The whole output is as follows. Here you can see words "rama0" and "rama1" with their corresponding frequencies.

{
  "aggregations" : {
    "the_keywords" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "rama0",
          "doc_count" : 4200,
          "the_last_day" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 3600,
            "buckets" : [
              {
                "key" : 1580169600000,
                "key_as_string" : "2020-01-28T00:00:00.000Z",
                "doc_count" : 600,
                "the_last_day_frequency" : {
                  "value" : 3000.0
                }
              }
            ]
          },
          "the_weekly_sum" : {
            "value" : 21000.0
          },
          "the_daily_average" : {
            "value" : 3000.0
          }
        },
        {
          "key" : "rama1",
          "doc_count" : 4200,
          "the_last_day" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 3600,
            "buckets" : [
              {
                "key" : 1580169600000,
                "key_as_string" : "2020-01-28T00:00:00.000Z",
                "doc_count" : 600,
                "the_last_day_frequency" : {
                  "value" : 3000.0
                }
              }
            ]
          },
          "the_weekly_sum" : {
            "value" : 21000.0
          },
          "the_daily_average" : {
            "value" : 3000.0
          }
        },
        [...]
      ]
    }
  }
}

Now I have the_daily_average in a high level of the output, and the_last_day_frequency in the single-element buckets list in the_last_day aggregation. I cannot use a bucket_script to compare those, because I cannot refer to a single bucket (if I place the script outside the_last_day aggregation) and I cannot refer to higher-level aggregations if I place the script inside the_last_day.

IMO the reasonable thing to do would be to put the script outside the aggregation and use a buckets_path using the <AGG_NAME><MULTIBUCKET_KEY> syntax mentioned in the docs, but I have tried "var1": "the_last_day[1580169600000]>the_last_day_frequency" and variations (hardcoding first until it works), but I haven't been able to refer to a particular bucket.

My ultimate goal is to have a list of keywords for which the last day frequency greatly exceeds the daily average.

For anyone interested, my current query is as follows. Notice that the part I'm struggling with is commented out.

body='{
    "query": {
        "range": {
            "date": {
                "gte": "START",
                "lte": "END"
            }
        }
    },
    "aggs": {
        "the_keywords": {
            "terms": {
                "field": "keyword",
                "size": 100
            },
            "aggs": {
                "the_weekly_sum": {
                    "sum": {
                        "field": "frequency"
                    }
                },
                "the_daily_average" : {
                    "bucket_script": {
                        "buckets_path": {
                            "weekly_sum": "the_weekly_sum"
                        },
                        "script": {
                            "inline": "return params.weekly_sum / 7"
                        }
                    }
                },
                "the_last_day": {
                    "terms": {
                        "field": "date",
                        "size": 1,
                        "order": {"_key": "desc"}
                    },
                    "aggs": {
                        "the_last_day_frequency": {
                            "sum": {
                                "field": "frequency"
                            }
                        }
                    }
                }/*,
                "the_spike": {
                    "bucket_script": {
                        "buckets_path": {
                            "last_day_frequency": "the_last_day>the_last_day_frequency",
                            "daily_average": "the_daily_average"
                        },
                        "script": {
                            "inline": "return last_day_frequency / daily_average"
                        }
                    }
                }*/
            }
        }
    }
}'

system · March 31, 2020, 12:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get value of a list Elasticsearch	3	518	January 6, 2017
Hourly aggregation for timestamp and Java API irrespective of date Elasticsearch	8	2569	December 28, 2017
Get number of days in a month to aggregate data Elasticsearch	7	1404	December 4, 2019
New to ES/Compare daily data to previous date's data Elasticsearch	7	5222	May 1, 2018
Above average Elasticsearch	4	392	July 9, 2018

Script with input from one particular bucket, or from a higher level aggregation

Related topics