Check the top_hits aggregation results to see if every top hit has a field with a specific value

I am trying to accomplish what seems like it should be super easy and obviously possible but hitting a hard brick wall instead.

  1. aggregate all elastic documents in a specific index from the last 60 days into buckets by a field within the document (not a problem).

GET /slot10info-*/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gt": "now-60d"
}
}
}
]
}
},
"size": 0,
"aggs" : {
"all_locos" : {
"terms" : {
"field" : "Vehicle.keyword",
"size" : 500
},
.
.
.

  1. get the top 5 hits WRT time in descending order and only keep two fields from each document. (not a problem).

.
.
.
"aggs": {
"top_date_hit": {
"top_hits": {
"sort": [
{
"Vehicle.Time": {
"order": "desc"
}
}
],
"_source": {
"includes": [ "ATT.PPP Status", "VZW.PPP Status" ]
},
"size" : 5
}
},
}
}
}
}

  1. I want to look at the "ATT.PPP Status" and "VZW.PPP Status" fields respective values OF THESE 5 TOP HITS ONLY, and see if they match, "PPP Link is down".

NOTE:
elasticsearch version: 5.6.9
I have tried for days to figure out how to simply give me a list of "Vehicle" bucket whose last 5 documents have the value in "PPP link is down" in either of the fields listed above. Please let me know if I left off anything important.

In short, it's not possible because top_hits are a last-minute enrichment of the results. E.g. they are collected at the very end after everything else is done, and they are collecting the raw _source JSON data not the internal indexed data. So the values being returned are essentially a string blob to Elasticsearch and there's no way to process it at the moment.

top_hits are mainly for enriching search results in the UI, not for actual logic or processing.

Is there a reason you can't put the "PPP Link is down" criteria in the query itself, using a term or match query?

@polyfractal That's what I was afraid of and kind of read similar things in other places but I was thinking maybe there was a way around it.. maybe with a painless script or similar.

I dont filter on "PPP Link is down" because I then I want to pick out the vehicle whose modem has been offline (PPP link down) 5 times in a row.

Do you know of a different approach to aggregate an index by "Vehicle.keyword", over the last x amount of time, and finally create a list of Vehicle's whose last 5 messages have the value "PPP link is down" for either the "ATT.PPP Status" OR "VZW.PPP Status" fields?

This seems like one of those "last known status" type questions with the added wrinkle of state being determined by the last 5 statuses.

It's another of the behavioural-analysis questions that benefits from using an entity-centric index rather than a log-centric index (the entity in question being a vehicle).

You can build these from your log data using the new dataframes API but the tricky bit of keeping the last 5 statuses will likely require the use of custom script.

@Mark_Harwood, I'm guessing dataframes requires an update to a later version of either Kibana or Elasticsearch? I'm confined to 5.6 on both Elastic and Kibana. I was really hoping to do my analysis with Watcher sending off Notifications based on results to Slack... all from the handy-dandy Kibana UI :disappointed: Really don't want to spin up something on some machine to query, analyze, alert but It's starting to appear like that's my only option.

Here’s the old way of doing it https://twitter.com/elasticmark/status/1009380268409610240?s=21

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.