Find servers whose last logged event was "error"

Hi everyone! Is there any way to filter top hits aggregation? For example I have the following documents in index:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }
{ name: "some_app", status: "OK", created_at: "2016-10-26" }
{ name: "another_app", status: "OK", created_at: "2016-10-26" }
{ name: "some_app", status: "ERROR", created_at: "2016-10-25" }
{ name: "another_app", status: "ERROR", created_at: "2016-10-25" }

What I can achieve is to group by name and get latest status using top hits aggregation:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }
{ name: "another_app", status: "OK", created_at: "2016-10-26" }

But I want to get a list of documents(grouped by "name") whose latest status is "ERROR". For example:

{ name: "some_app", status: "ERROR", created_at: "2016-10-27" }

My current query is:

 {
    "size": 0,
    "aggs": {
        "group": {
            "terms": {
                "field": "name"
            },
            "aggs": {
                "group_docs": {
                    "top_hits": {
                        "size": 1,
                        "sort": [
                            {
                                "created_at": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

Any help would be appreciated. Thanks!

If I understand correctly, your query should be giving you the right data but it is not formatted as you'd like it?
We don't have the equivalent of XSL to translate our JSON response with "buckets" arrays etc into the collapsed format you require - you'd have to do this in your application code.

Thank you for the response. No, the problem is that I cannot filter top hits aggregation results. I want to group documents by name and then filter them by status on latest document. For example, how to get documents where latest status is "ERROR"?

If we run query from previous post, we'll get 2 results. One with status "ERROR" and another with status "OK". I want to get only records with status "ERROR".

Documents can go through a lot of stages, and status may change a lot of times, but we should always query only by the last one.

This pipeline agg looks to do the trick (running on 5.0 here)

DELETE test
PUT test
{
   "settings": {
	  "number_of_shards": 1,
	  "number_of_replicas": 0
   },
   "mappings": {
	  "doc": {
		 "properties": {
			"name": {
			   "type": "keyword"
			},
			"status": {
			   "type": "keyword"
			},
			"created_at": {
			   "type": "date"
			}
		 }
	  }
   }
}
POST test/event
{ "name": "some_app", "status": "ERROR", "created_at": "2016-10-27" }
POST test/event
{ "name": "some_app", "status": "OK", "created_at": "2016-10-26" }
POST test/event
{ "name": "another_app", "status": "OK", "created_at": "2016-10-26" }
POST test/event
{ "name": "another_app", "status": "ERROR", "created_at": "2016-10-25" }

GET test/event/_search
{
	"size": 0,
	"aggs": {
		"group": {
			"terms": {
				"field": "name"
			},
			"aggs": {
				"lastStatusDate":{
					"max":{
						"field":"created_at"
					}
				},
				"lastError":{
					"filter":{
						"term":{
							"status":"ERROR"
						}
					},
					"aggs":{
						"lastErrorDate":{
							"max":{
								"field":"created_at"
							}
						}
					}
				},
				 "CheckLastStatusIsError": {
					"bucket_selector": {
						"buckets_path": {
						  "lastStatusDate": "lastStatusDate.value",
						  "lastErrorDate": "lastError>lastErrorDate.value"
						},
						"script": "params.lastErrorDate == params.lastStatusDate"
					}
				}
			}
		}
	}
}
3 Likes

I've taken the liberty of renaming your question subject because

  1. It should help others with the same business problem find this topic
  2. We moved away from "top hits" as being the problem

@Mark_Harwood you are a genius! This is exactly what I want. Thank you!

1 Like

@Mark_Harwood, thanks for assisting your advice has helped me much as well, but is there any way to get rid of the terms buckets payload somehow and thus reduce its size since I only need to know this array's length?

I think we're pushing into the limits of what is easily expressed in our DSL and easily computed on-the-fly using raw event data. These questions and more would be so much easier to process using an entity-centric index [1] alongside your event store where the entity in question was a device.

[1] https://www.youtube.com/watch?v=yBf7oeJKH2Y