Array_Compare working with Nested Aggregations

Micah_Hunsberger · August 7, 2018, 5:43pm

I have a watch that is looking for a threshold number of login failures for hosts and their usernames for within a certain interval.

I am having trouble with the condition section of the watch to determine if a certain user has failed to login X amount of times within the interval because the results of the query come in a nested aggregation form.

my search body section:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "task": "Logon"
          }
        },
        {
          "term": {
            "keywords": "Audit Failure"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "@timestamp": {
              "from": "{{ctx.trigger.scheduled_time}}||-{{ctx.metadata.interval}}",
              "to": "{{ctx.trigger.scheduled_time}}"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_host": {
      "terms": {
        "field": "beat.name",
        "min_doc_count": "{{ctx.metadata.attempt_threshold}}"
      },
      "aggs": {
        "group_by_user": {
          "terms": {
            "script": {
              "source": "doc['event_data.TargetDomainName'].value + '/' + doc['event_data.TargetUserName'].value",
              "lang": "painless"
            },
            "min_doc_count": "{{ctx.metadata.attempt_threshold}}"
          },
          "aggs": {
            "logins_over_time": {
              "date_histogram": {
                "field": "@timestamp",
                "interval": "{{ctx.metadata.interval}}",
                "min_doc_count": "{{ctx.metadata.attempt_threshold}}"
              }
            }
          }
        }
      }
    }
  }
}

this returns results that have this format:

{
  "_shards": {
    "...": "..."
  },
  "hits": {
    "hits": "..."
  },
  "took": 88,
  "timed_out": false,
  "aggregations": {
    "group_by_host": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "doc_count": 35,
          "key": "FooHost",
          "group_by_user": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "foodomain/foouser",
                "doc_count": 15,
                "logins_over_time": {
                  "buckets": [
                    {
                      "key_as_string": "2018-05-15T15:15:00.000Z",
                      "doc_count": 6,
                      "key": "unix_epoch_time"
                    },
                    {
                      "...":"..."
                    }
                  ]
                }
              },
              {
                "doc_count": 8,
                "logins_over_time": {
                  "buckets": [
                    { "..." : "..." }
                  ]
                },
                "key": "foodomain/baruser"
              }
            ]
          }
        },
        {
          "doc_count": 9,
          "key": "BarHost",
          "group_by_user": {
            "buckets": [
              { "..." : "..." }
            ]
          }
        }
      ]
    }
  }
}

Where I would care about the logins_over_time.buckets array and I would like to in the condition section have something along the lines of:

{
  "condition": {
    "array_compare": {
      "ctx.payload.aggregations.group_by_host.buckets.group_by_user.buckets.logins_over_time.buckets": {
        "path": "doc_count",
        "gte": {
          "value": "{{ctx.metadata.attempt_threshold}}"
        }
      }
    }
  }
}

But of course that doesn't work because the logins_over_time.buckets array is nested inside the other aggregations. I can access a single element of the array by using ctx.payload.aggregations.group_by_host.buckets.0.group_by_user.buckets.0.logins_over_time.buckets, but that doesn't guarrantee that the first bucket will be one with more than the threshold number. Are nested arrays something that Watcher is capable of comparing? If so, how? The documentation only shows a parent array. Or, is there another way to check for x-number of events in y-amount of time? I'm wondering if this is not a job more suitable for a machine learning model.

Thanks

_Sergey · August 8, 2018, 8:19am

Hey, some time ago I also was looking how to access every bucket in aggregations (will check this thread:) )
But then I decided to use bucket selector so you can filter in your aggs what you want and then in buckets you will see only those docs/keys that are faced with your condition. And then I just access first bucket if it exists = alert me, otherwise no docs with needed condition so no alert.

Hope this helps, not the answer on the main question but anyway.

spinscale · August 8, 2018, 8:19am

the array_compare condition is not able to handle several nested levels. use a script condition with a painless script instead.

Micah_Hunsberger · August 8, 2018, 5:52pm

@_Sergey thanks for the pointer, your suggestion pointed me in the right direction. I didn't end up using bucket_selector, but I did use another pipeline aggregation. Since in this scenario, I only care about some event passing a threshold w/in a time period, I could use the max_bucket pipeline aggregation.

Unfortunately, pipeline aggregations also can only handle a single nested bucket, so I had to use the max_bucket aggregation at each level, which took the maximum of the previous max_bucket

"aggs": {
  "group_by_host": {
    "terms": { ... },
    "aggs": {
      "group_by_user": {
        "terms": { ... },
        "aggs": {
          "logins_over_time": {
            "date_histogram": { ... }
          },
          "max_logins_in_interval": {
            "max_bucket": {
              "buckets_path": "logins_over_time._count"
            }
          }
        }
      },
      "max_logins_for_users": {
        "max_bucket": {
          "buckets_path": "group_by_user>max_logins_in_interval"
        }
      }
    }
  },
  "max_logins_for_hosts": {
    "max_bucket": {
      "buckets_path": "group_by_host>max_logins_for_users"
    }
  }
}

Then, my query results had maximums for each level of aggregation, which means I could have the following condition clause:

"condition": {
    "compare": {
      "ctx.payload.aggregations.max_logins_for_hosts.value": {
        "gte": "{{ctx.metadata.attempt_threshold}}"
      }
    }
  }

If that condition is met, then I know at least one of the logins_over_time buckets had a doc_count >= metadata.attempt_threshold

Thanks!

system · September 5, 2018, 5:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using a threshold on doc_count within a nested aggregation Kibana elastic-stack-alerting	5	659	February 4, 2021
Need to alert if nested aggregation returns more than 3 results Elasticsearch elastic-stack-alerting	1	366	February 4, 2021
Access nested aggregations in watcher's condition Elasticsearch elastic-stack-alerting	14	1884	September 10, 2020
Looping through nested aggregations error Elasticsearch	2	400	April 4, 2021
Creating a Watch with Nest difficulty Elasticsearch elastic-stack-alerting	3	1647	July 6, 2017

Array_Compare working with Nested Aggregations

Related topics