Use script to select from non-numeric buckets -- possible?

To solve my problem, I think what I want is to use scripting to select buckets, but I can't figure out how to do it. All my attempts run into complaints that I must point to a numeric aggregation

In the data below, I want to bucket by alert name, then for each of those buckets see if the latest record is "fire" or "resolve",

I tried various combination of filter or a term sub aggregation or sibling aggregation or parent aggregation, and all them either say, you can do a sub-aggregation on top_hits or you can't point a bucket_path to a non-numeric agg result or the max aggregation cannot have sub-buckets

 POST _bulk?refresh
{ "index" : { "_index" : "alerts", "_id" : "1", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "2", "_type" : "log" } }
{ "name" : "storage_controller_failure" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "3", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T17:00:00" , "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "4", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "5", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "resolve"}

Suppose I want to group by alert-name
In this case I would get 2 buckets,
one with the key "storage_controller_failure" and the other"drive_corrupted".
This I can do. Easy-peasy.

What I really want to do is determine if the last record in the bucket is "fire".
Let's assume that the only values for the name "fr" are "fire" and "resolve".
In the data above, the "drive_corrupted" bucket should be dropped because the latest record is "resolve".

I am failing to figure out how to express what I want in the query language.
I can certainly use top_hits or do max of @timestamp to order the records in each bucket.
I can even limit the output of the top_hits or the max so that each bucket only has the latest record ( a single record in each bucket).

Every time I tried to apply a script or filter, I run into the restriction that the script or filter can only apply to numbers.

Here is one of my attempts:

GET alerts*/_search
{
  "size" : 0,
  "aggs": {
    "by_alert_name": {
      "terms": {  "field": "name",  "size": 10  },
      "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,
            "sort": [ { "@timestamp": { "order": "desc" } } ]
          }
        },
        "firing" : {
          "bucket_script": {
            "buckets_path": {"latest" : "latest"},
            "script": "return 1"
          }
        }
      }
    }
  }
}

and the error message:

reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.metrics.tophits.InternalTopHits"

What I really want to do is determine if the last record in the bucket is "fire".

It sounds like you don't actually want to aggregate, but instead reduce search results for each alert to a single document. Have you looked at field collapsing?

I think I need to aggregate first before I collapse because I want to know the last status for each name.

I'm trying a script-metric now so that I can deal with the types manually (to avoid those pesky numeric type requirements).

The early signs are promising, but type conversions between elasticsearch types and Painless/Groovy/Java types are not painless. :slightly_smiling_face:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.