Obtain doc_count values from terms aggregation in bucket

Hello Community!

I am having a problem formulating a query in Elasticseach. I want to create a watcher to alert me on slack, the top hits that exceed the 4% of all the documents in a time interval. So in the query of the watcher, I have to group the IPs by the field called ClientIP defined as IP type and return the ones that have an occurrence greater than or equal to 4%.

I've been working on a query and I can't obtain the 'doc_count' results that are obtained through bucket aggregation, to be able to calculate the percentage of each IP. For example, the IP 31.222.183.224 appears 15.488 of a total of 1.361.938 documents so it's the 1% of occurrence.

In this case, the query is composed of an aggregation, containing a grouping by terms and the total count of the documents in a certain range of time. In addition, I include a bucket aggregation in order to filter out terms that have a document count greater than 4%.

{
	"query": {
		"bool": {
			"filter": [{
				"range": {
					"EdgeStartTimestamp": {
						"gte": "now-15m"
					}
				}
			}]
		}
	},
  "aggs": {
    "hist": {
      "date_histogram": {
        "field": "EdgeStartTimestamp",
        "calendar_interval": "week"
      },
      "aggs": {
        "total_ips": {
          "cardinality": {
            "field": "_id"
          }
        },
        "top_hits_ip": {
          "terms": {
            "field": "ClientIP",
            "size": 10
          }
        },
        "percent_4": {
          "bucket_script": {
            "buckets_path": {
              "top_hits_ip": "top_hits_ip['doc_count']",
              "total_ips": "total_ips"
            },
            "script": "0.04 < params.top_hits_ip/params.total_ips"
          }
        }
      }
    }
  }
}

And I get the following response:

{
  "took" : 579,
  "timed_out" : false,
  "_shards" : {
    "total" : 117,
    "successful" : 117,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "hist" : {
      "buckets" : [
        {
          "key_as_string" : "2021-10-11T00:00:00.000Z",
          "key" : 1633910400000,
          "doc_count" : 1379400,
          "top_hits_ip" : {
            "doc_count_error_upper_bound" : 1593,
            "sum_other_doc_count" : 1349939,
            "buckets" : [
              {
                "key" : "31.222.183.224",
                "doc_count" : 15488
              },
              {
                "key" : "31.95.168.236",
                "doc_count" : 8636
              },
              {
                "key" : "192.98.231.3",
                "doc_count" : 5337
              }
            ]
          },
          "total_ips" : {
            "value" : 1361938
          }
        }
      ]
    }
  }
}

Nevertheless, I'm not obtaining the buckets 'doc_count' in the correct way. How can I get the 'doc_count' to divide it into the total amount of results so I filter the necessary IPs by the percentage?

"buckets" : [
              {
                "key" : "31.222.183.224",
                "doc_count" : 15488
              },

Thank you in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.