Find servers whose last logged event was "error"

Ihor · October 27, 2016, 12:55pm

Hi everyone! Is there any way to filter top hits aggregation? For example I have the following documents in index:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }
{ name: "some_app", status: "OK", created_at: "2016-10-26" }
{ name: "another_app", status: "OK", created_at: "2016-10-26" }
{ name: "some_app", status: "ERROR", created_at: "2016-10-25" }
{ name: "another_app", status: "ERROR", created_at: "2016-10-25" }

What I can achieve is to group by name and get latest status using top hits aggregation:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }
{ name: "another_app", status: "OK", created_at: "2016-10-26" }

But I want to get a list of documents(grouped by "name") whose latest status is "ERROR". For example:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }

My current query is:

 {
    "size": 0,
    "aggs": {
        "group": {
            "terms": {
                "field": "name"
            },
            "aggs": {
                "group_docs": {
                    "top_hits": {
                        "size": 1,
                        "sort": [
                            {
                                "created_at": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

Any help would be appreciated. Thanks!

Mark_Harwood · October 27, 2016, 1:22pm

If I understand correctly, your query should be giving you the right data but it is not formatted as you'd like it?
We don't have the equivalent of XSL to translate our JSON response with "buckets" arrays etc into the collapsed format you require - you'd have to do this in your application code.

Ihor · October 27, 2016, 1:55pm

Thank you for the response. No, the problem is that I cannot filter top hits aggregation results. I want to group documents by name and then filter them by status on latest document. For example, how to get documents where latest status is "ERROR"?

If we run query from previous post, we'll get 2 results. One with status "ERROR" and another with status "OK". I want to get only records with status "ERROR".

Documents can go through a lot of stages, and status may change a lot of times, but we should always query only by the last one.

Mark_Harwood · October 27, 2016, 2:58pm

This pipeline agg looks to do the trick (running on 5.0 here)

DELETE test
PUT test
{
   "settings": {
	  "number_of_shards": 1,
	  "number_of_replicas": 0
   },
   "mappings": {
	  "doc": {
		 "properties": {
			"name": {
			   "type": "keyword"
			},
			"status": {
			   "type": "keyword"
			},
			"created_at": {
			   "type": "date"
			}
		 }
	  }
   }
}
POST test/event
{ "name": "some_app", "status": "ERROR", "created_at": "2016-10-27" }
POST test/event
{ "name": "some_app", "status": "OK", "created_at": "2016-10-26" }
POST test/event
{ "name": "another_app", "status": "OK", "created_at": "2016-10-26" }
POST test/event
{ "name": "another_app", "status": "ERROR", "created_at": "2016-10-25" }

GET test/event/_search
{
	"size": 0,
	"aggs": {
		"group": {
			"terms": {
				"field": "name"
			},
			"aggs": {
				"lastStatusDate":{
					"max":{
						"field":"created_at"
					}
				},
				"lastError":{
					"filter":{
						"term":{
							"status":"ERROR"
						}
					},
					"aggs":{
						"lastErrorDate":{
							"max":{
								"field":"created_at"
							}
						}
					}
				},
				 "CheckLastStatusIsError": {
					"bucket_selector": {
						"buckets_path": {
						  "lastStatusDate": "lastStatusDate.value",
						  "lastErrorDate": "lastError>lastErrorDate.value"
						},
						"script": "params.lastErrorDate == params.lastStatusDate"
					}
				}
			}
		}
	}
}

Mark_Harwood · October 27, 2016, 4:18pm

I've taken the liberty of renaming your question subject because

It should help others with the same business problem find this topic
We moved away from "top hits" as being the problem

Ihor · October 27, 2016, 4:29pm

@Mark_Harwood you are a genius! This is exactly what I want. Thank you!

dkulichkin · November 1, 2016, 10:00am

@Mark_Harwood, thanks for assisting your advice has helped me much as well, but is there any way to get rid of the terms buckets payload somehow and thus reduce its size since I only need to know this array's length?

Mark_Harwood · November 1, 2016, 10:08am

I think we're pushing into the limits of what is easily expressed in our DSL and easily computed on-the-fly using raw event data. These questions and more would be so much easier to process using an entity-centric index [1] alongside your event store where the entity in question was a device.

[1] https://www.youtube.com/watch?v=yBf7oeJKH2Y

Topic		Replies	Views
Filter buckets after top_hits aggregation Elasticsearch	12	1644	June 22, 2020
Condition in aggregation does not filter the hit result Elasticsearch	3	395	November 4, 2019
Filter aggregation buckets by top hits scripted field Elasticsearch	1	590	July 5, 2021
Filter after aggregation Elasticsearch	3	796	September 17, 2019
Check the top_hits aggregation results to see if every top hit has a field with a specific value Elasticsearch	6	1160	October 8, 2019

Find servers whose last logged event was "error"

Related topics