Use script to select from non-numeric buckets -- possible?

anelson-edge · August 22, 2019, 10:09pm

To solve my problem, I think what I want is to use scripting to select buckets, but I can't figure out how to do it. All my attempts run into complaints that I must point to a numeric aggregation

In the data below, I want to bucket by alert name, then for each of those buckets see if the latest record is "fire" or "resolve",

I tried various combination of filter or a term sub aggregation or sibling aggregation or parent aggregation, and all them either say, you can do a sub-aggregation on top_hits or you can't point a bucket_path to a non-numeric agg result or the max aggregation cannot have sub-buckets

 POST _bulk?refresh
{ "index" : { "_index" : "alerts", "_id" : "1", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "2", "_type" : "log" } }
{ "name" : "storage_controller_failure" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "3", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T17:00:00" , "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "4", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "5", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "resolve"}

Suppose I want to group by alert-name
In this case I would get 2 buckets,
one with the key "storage_controller_failure" and the other"drive_corrupted".
This I can do. Easy-peasy.

What I really want to do is determine if the last record in the bucket is "fire".
Let's assume that the only values for the name "fr" are "fire" and "resolve".
In the data above, the "drive_corrupted" bucket should be dropped because the latest record is "resolve".

I am failing to figure out how to express what I want in the query language.
I can certainly use top_hits or do max of @timestamp to order the records in each bucket.
I can even limit the output of the top_hits or the max so that each bucket only has the latest record ( a single record in each bucket).

Every time I tried to apply a script or filter, I run into the restriction that the script or filter can only apply to numbers.

Here is one of my attempts:

GET alerts*/_search
{
  "size" : 0,
  "aggs": {
    "by_alert_name": {
      "terms": {  "field": "name",  "size": 10  },
      "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,
            "sort": [ { "@timestamp": { "order": "desc" } } ]
          }
        },
        "firing" : {
          "bucket_script": {
            "buckets_path": {"latest" : "latest"},
            "script": "return 1"
          }
        }
      }
    }
  }
}

and the error message:

reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.metrics.tophits.InternalTopHits"

rjernst · August 23, 2019, 5:06pm

What I really want to do is determine if the last record in the bucket is "fire".

It sounds like you don't actually want to aggregate, but instead reduce search results for each alert to a single document. Have you looked at field collapsing?

anelson-edge · August 23, 2019, 9:15pm

I think I need to aggregate first before I collapse because I want to know the last status for each name.

anelson-edge · August 23, 2019, 9:44pm

I'm trying a script-metric now so that I can deal with the types manually (to avoid those pesky numeric type requirements).

The early signs are promising, but type conversions between elasticsearch types and Painless/Groovy/Java types are not painless.

system · September 20, 2019, 9:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch sub-aggregation on bucket script - Categorize bucket script result Elasticsearch	1	477	February 5, 2018
Attempting to perform aggregation script on ES buckets Elasticsearch	2	401	September 27, 2019
Bucket Selector Aggregation Script- access doc index field value Elasticsearch	3	2003	July 5, 2017
Bucket Script Aggregation on Range Aggregation Elasticsearch	7	4080	July 5, 2017
Bucket Sort on non-numeric (keyword) field Elasticsearch	1	680	November 14, 2019

Use script to select from non-numeric buckets -- possible?

Related topics