Machine Learning Functions

If i had a Datafeed that spammed multiple docs at once ( for example botnetalert, it triggered 20x )

What is the best function to use to just alert of that one event of all 20 events

this is in a short period of course

I think you'll need to describe your use case with a little more detail before I can make a recommendation. I don't know what data you're analyzing, what detector functions you're using or what you'd like your alerts to look like. Provide any necessary samples or screenshots if that helps make things easier. thanks in advance.

Right now , this query is looking for any event.code:Login with specific accounts as you can see ther is signals for 1 event. I am asking which Ml function would reduce this to 1 signal

I think this question might be a little too complicated to answer in this medium - it might be appropriate to get our support or Professional Services involved to help you.

But, with that said, I could imagine a scenario in which you create an ML job that used an aggregated query in which the signals index was queried, but aggregated the count on the signal names were calculated. Then ML could invoke the count function with a partition on the segregated rule names. The query being made would look roughly like this:

GET .siem-signals-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "filter": [
            {
                "term": {
                  "signal.status": "open"
                }
              }
            ],
            "must_not": [
              {
                "exists": {
                  "field": "signal.rule.building_block_type"
                }
              }
            ]
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2021-02-07T00:00:00.000Z",
              "lte": "2021-02-07T01:00:00.000Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "count_per_name": {
      "terms": {
        "field": "signal.rule.name",
        "size": 1000
      }
    }
  }
}

(but of course you'd remove the range clause and follow the suggestions in the documentation about how to get an ML job's datafeed to effectively utilize a query with aggregations)

This kind of query would yield results that roughly looked like:

and it is those aggregated counts that ML could analyze (with the count function and use of the summary_count_field_name flag to track the count values over time.

1 Like