To solve my problem, I think what I want is to use scripting to select buckets, but I can't figure out how to do it. All my attempts run into complaints that I must point to a numeric aggregation
In the data below, I want to bucket by alert name, then for each of those buckets see if the latest record is "fire" or "resolve",
I tried various combination of filter or a term sub aggregation or sibling aggregation or parent aggregation, and all them either say, you can do a sub-aggregation on top_hits or you can't point a bucket_path to a non-numeric agg result or the max aggregation cannot have sub-buckets
POST _bulk?refresh
{ "index" : { "_index" : "alerts", "_id" : "1", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "2", "_type" : "log" } }
{ "name" : "storage_controller_failure" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "3", "_type" : "log" } }
{ "name" : "storage_controller_failure", "@timestamp" : "2019-08-22T17:00:00" , "fr" : "fire"}
{ "index" : { "_index" : "alerts", "_id" : "4", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T15:00:00", "fr" : "fire" }
{ "index" : { "_index" : "alerts", "_id" : "5", "_type" : "log" } }
{ "name" : "drive_corrupted" , "@timestamp" : "2019-08-22T16:00:00", "fr" : "resolve"}
Suppose I want to group by alert-name
In this case I would get 2 buckets,
one with the key "storage_controller_failure" and the other"drive_corrupted".
This I can do. Easy-peasy.
What I really want to do is determine if the last record in the bucket is "fire".
Let's assume that the only values for the name "fr" are "fire" and "resolve".
In the data above, the "drive_corrupted" bucket should be dropped because the latest record is "resolve".
I am failing to figure out how to express what I want in the query language.
I can certainly use top_hits or do max of @timestamp to order the records in each bucket.
I can even limit the output of the top_hits or the max so that each bucket only has the latest record ( a single record in each bucket).
Every time I tried to apply a script or filter, I run into the restriction that the script or filter can only apply to numbers.
Here is one of my attempts:
GET alerts*/_search
{
"size" : 0,
"aggs": {
"by_alert_name": {
"terms": { "field": "name", "size": 10 },
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": [ { "@timestamp": { "order": "desc" } } ]
}
},
"firing" : {
"bucket_script": {
"buckets_path": {"latest" : "latest"},
"script": "return 1"
}
}
}
}
}
}
and the error message:
reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.metrics.tophits.InternalTopHits"