Easy way to insert top level query aggregation details back into elastic

Sandeep_Takhar · February 11, 2016, 6:50pm

Hi.

Been following Zach's pretty cool work around getting the ebay anomaly detection algorithm to work in elastic.

In his example he has three level grouping. I just want the top level term and ninetieth_surprise and send that back into elastic and looking for ideas. I've searched around, but not finding much.

I have the example working with my data, but all I did was change three values to match my field names.

Here is Zach's example:

There are 5 "metrics" with 5 ninetieth_percentiles in the json (in my json I get two ninetieth percentiles for each top level for some reason). My json return has all the bottom level buckets. I just want to insert the top level metric name and ninetieth_percentile..which is part of the top level bucket.

{
"query": {
"filtered": {
"filter": {
"range": {
"hour": {
"gte": "{{start}}",
"lte": "{{end}}"
}
}
}
}
},
"size": 0,
"aggs": {
"metrics": {
"terms": {
"field": "metric",
“size”: 5
},
"aggs": {
"queries": {
"terms": {
"field": "query",
"size": 500
},
"aggs": {
"series": {
"date_histogram": {
"field": "hour",
"interval": "hour"
},
"aggs": {
"avg": {
"avg": {
"field": "value"
}
},
"movavg": {
"moving_avg": {
"buckets_path": "avg",
"window": 24,
"model": "simple"
}
},
"surprise": {
"bucket_script": {
"buckets_path": {
"avg": "avg",
"movavg": "movavg"
},
"script": "(avg - movavg).abs()"
}
}
}
},
"largest_surprise": {
"max_bucket": {
"buckets_path": "series.surprise"
}
}
}
},
"ninetieth_surprise": {
"percentiles_bucket": {
"buckets_path": "queries>largest_surprise",
"percents": [
90.0
]
}
}
}
}
}
}

Sandeep_Takhar · February 12, 2016, 12:39am

One thing I'm doing is to use filter_path. Actually Zach mentioned it in his second article and it took me a while to find it. My top level field name is different, but it looks like this:

I think I'll just use logstash to read this file that I output and dump it into elastic...I've already got a framework for doing just that and I've dealt with json objects before.

pretty=true&human=false&flat_settings=true&filter_path=aggregations.agent_names.buckets.key,aggregations.agent_names.buckets.ninetieth_surprise.values

Don't know if it's a good way or not to do it, but I'll give it a try.

Sandeep_Takhar · February 12, 2016, 2:22am

Here is how I flattened out and split the resulting output as well...now I will just send to elastic using the output plugin. Again..not sure if best way, but it works:

input {
stdin { codec => json }
}

#filter{

grok{

match => ["message","%{GREEDYDATA:msg}"]

}

#trick to reparse the message from text file (brought in as text)

json {

source => "msg"

}

#}

filter
{
mutate {
rename => [
"[aggregations][agent_names][buckets]", "buckets"
]
remove_field => "aggregations"
}
}

filter {
split {
field => "buckets"
}
}

filter
{
mutate {
rename => [
"[buckets][key]", "agent_name",
"[buckets][ninetieth_surprise][values][90.0]", "ninetieth_surprise"
]
remove_field => "buckets"
}
}

output {
stdout { codec => rubydebug }
}

Sandeep_Takhar · February 12, 2016, 1:39pm

Here is the data I get using filter_data, or at least part of it..than can be used with above config. In case anyone is following. I'll create a cron job now and see how the data looks like in timelion. There are plenty of null values because it is staging environment and it's not very busy at the moment.

{"aggregations":{"agent_names":{"buckets":[{"key":"custmgtbill2srv1","ninetieth_surprise":{"values":{"90.0":0.00917166937529728}},"ninetieth_surprise":{"values":{"90.0":0.00917166937529728}}},{"key":"entmarketoffersvc3","ninetieth_surprise":{"values":{"90.0":0.016666666666666666}},"ninetieth_surprise":{"values":{"90.0":0.016666666666666666}}},{"key":"custmgtconsumersvc1","ninetieth_surprise":{"values":{"90.0":0.017316291670002933}},"ninetieth_surprise":{"values":{"90.0":0.017316291670002933}}},{"key":"custmgtconsumersvc4","ninetieth_surprise":{"values":{"90.0":0.01162318909094097}},"ninetieth_surprise":{"values":{"90.0":0.01162318909094097}}},{"key":"billpresentmentweb3","ninetieth_surprise":{"values":{"90.0":0.0}},"ninetieth_surprise":{"values":{"90.0":0.0}}},{"key":"custmgtconsumerweb4","ninetieth_surprise":{"values":{"90.0":1.4005602240896359E-6}},"ninetieth_surprise":{"values":{"90.0":1.4005602240896359E-6}}},{"key":"custmgtfulweb1","ninetieth_surprise":{"values":{"90.0":0.0}},"ninetieth_surprise":{"values":{"90.0":0.0}}},{"key":"custmgtfulweb3","ninetieth_surprise":{"values":{"90.0":0.0}},"ninetieth_surprise":{"values":{"90.0":0.0}}},{"key":"custmgtfulweb4","ninetieth_surprise":{"values":{"90.0":0.0}},"ninetieth_surprise":{"values":{"90.0":0.0}}}]}}}

Sandeep_Takhar · February 12, 2016, 11:22pm

I'll also post the curl which I use that is in a cronjob in case someone runs into this post to complete out the turn-key solution (?)

r" : {
"and" : [ {
"range" : {
"@timestamp" : {
"gte": "now-1h",
"lte": "now"
}
}
},
{
"terms" : { "metric_name" : ["durationmean"]}
} ]
}
}
},
"size": 0,
"aggs": {
"agent_names": {
"terms": {
"field": "agent_name",
"size": 5000
},
"aggs": {
"metric_names": {
"terms": {
"field": "metric_name",
"size": 10000
},
"aggs": {
"series": {
"date_histogram": {
"field": "@timestamp",
"interval": "minute"
},
"aggs": {
"avg": {
"avg": {
"field": "metric_value"
}
},
"movavg": {
"moving_avg": {
"buckets_path": "avg",
"window": 60,
"model": "simple"
}
},
"surprise": {
"bucket_script": {
"buckets_path": {
"avg": "avg",
"movavg": "movavg"
},
"script": "(avg - movavg).abs()"
}
}
}
},
"largest_surprise": {
"max_bucket": {
"buckets_path": "series.surprise"
}
}
}
},
"ninetieth_surprise": {
"percentiles_bucket": {
"buckets_path": "metric_names>largest_surprise",
"percents": [
90.0
]
}
}
}
}
}
}'

Topic		Replies	Views
Elasticsearch Range Aggregation is including upper limit excluding lower limit Elasticsearch	1	523	March 26, 2020
Help With Adding New Aggregation Elasticsearch	1	521	April 4, 2017
How to get the Top Hit result from the aggregated 95th percentile Elasticsearch	4	332	January 15, 2019
Bucket query results \| top hits performance Elasticsearch	8	3772	July 6, 2017
Elastic aggregation query question Elasticsearch	2	550	July 21, 2017