Hello Community!
I am having a problem formulating a query in Elasticseach. I want to create a watcher to alert me on slack, the top hits that exceed the 4% of all the documents in a time interval. So in the query of the watcher, I have to group the IPs by the field called ClientIP defined as IP type and return the ones that have an occurrence greater than or equal to 4%.
I've been working on a query and I can't obtain the 'doc_count' results that are obtained through bucket aggregation, to be able to calculate the percentage of each IP. For example, the IP 31.222.183.224 appears 15.488 of a total of 1.361.938 documents so it's the 1% of occurrence.
In this case, the query is composed of an aggregation, containing a grouping by terms and the total count of the documents in a certain range of time. In addition, I include a bucket aggregation in order to filter out terms that have a document count greater than 4%.
{
"query": {
"bool": {
"filter": [{
"range": {
"EdgeStartTimestamp": {
"gte": "now-15m"
}
}
}]
}
},
"aggs": {
"hist": {
"date_histogram": {
"field": "EdgeStartTimestamp",
"calendar_interval": "week"
},
"aggs": {
"total_ips": {
"cardinality": {
"field": "_id"
}
},
"top_hits_ip": {
"terms": {
"field": "ClientIP",
"size": 10
}
},
"percent_4": {
"bucket_script": {
"buckets_path": {
"top_hits_ip": "top_hits_ip['doc_count']",
"total_ips": "total_ips"
},
"script": "0.04 < params.top_hits_ip/params.total_ips"
}
}
}
}
}
}
And I get the following response:
{
"took" : 579,
"timed_out" : false,
"_shards" : {
"total" : 117,
"successful" : 117,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"hist" : {
"buckets" : [
{
"key_as_string" : "2021-10-11T00:00:00.000Z",
"key" : 1633910400000,
"doc_count" : 1379400,
"top_hits_ip" : {
"doc_count_error_upper_bound" : 1593,
"sum_other_doc_count" : 1349939,
"buckets" : [
{
"key" : "31.222.183.224",
"doc_count" : 15488
},
{
"key" : "31.95.168.236",
"doc_count" : 8636
},
{
"key" : "192.98.231.3",
"doc_count" : 5337
}
]
},
"total_ips" : {
"value" : 1361938
}
}
]
}
}
}
Nevertheless, I'm not obtaining the buckets 'doc_count' in the correct way. How can I get the 'doc_count' to divide it into the total amount of results so I filter the necessary IPs by the percentage?
"buckets" : [
{
"key" : "31.222.183.224",
"doc_count" : 15488
},
Thank you in advance!