Threshold Detection Ignoring Group By Field

Hi

I have upgraded to 7.11.0 and now 7.11.1 and I think there is an issue with Threshold detections.

I have a simple detection based on my FW logs. It looks for flow_denied or flow_dropped messages and raises a detection if a single source.ip has >= 500 events.

So the query is straight forward and the group by field is source.ip.

This used to work fine and I have a timeline template which was showing me the expected results and working well.

The rule now seems to fire for almost anything and is missing many of the fields. Many of the results that are returned don't even have a source.ip field.

The query is correct, I can apply the same filter and build a visualisation and it shows me the results as expected.

I'm no expert on this so may be wrong here...however, I have viewed my detection rule, everything is in place and I click the preview which shows it would be very noisy, as shown below. If I then click the Inspect button related to that preview is shows me the query that was sent, also shown below.

I don't fully follow the query language, however I can't see anything in there that references the grouping option of source.ip and a threshold of 500 events. At a minimum, I would expect to be able to search that block of text and find source.ip in there but it's not.

As I say, possible my misunderstanding but if this is not the issue, there does seem to have been a problem introduced with threshold detections since 7.10.2.

I have also tried creating a new rule in case the old one failed to migrate from 7.10.2 to 7.11.x for some reason but the result is the same.

Thanks in advance

Phil

{
  "aggregations": {
"eventActionGroup": {
  "terms": {
    "field": "event.category",
    "missing": "All others",
    "order": {
      "_count": "desc"
    },
    "size": 10
  },
  "aggs": {
    "events": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "112500ms",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": 1613742526822,
          "max": 1613746126822
        }
      }
    }
  }
}
  },
  "query": {
"bool": {
  "filter": [
    {
      "bool": {
        "must": [],
        "filter": [
          {
            "bool": {
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match": {
                          "event.module": "panw"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                },
                {
                  "bool": {
                    "should": [
                      {
                        "bool": {
                          "should": [
                            {
                              "match": {
                                "event.action": "flow_denied"
                              }
                            }
                          ],
                          "minimum_should_match": 1
                        }
                      },
                      {
                        "bool": {
                          "should": [
                            {
                              "match": {
                                "event.action": "flow_dropped"
                              }
                            }
                          ],
                          "minimum_should_match": 1
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ]
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    },
    {
      "range": {
        "@timestamp": {
          "gte": "2021-02-19T13:48:46.822Z",
          "lte": "2021-02-19T14:48:46.822Z",
          "format": "strict_date_optional_time"
        }
      }
    }
  ]
}
  },
"size": 0,
"track_total_hits": true
}

Hi @PhilA! Thanks for the question, and sorry you're experiencing problems... I wanted to clarify a few things regarding changes in functionality in 7.11+.

We did update the functionality in 7.11 so that the fields queried in the original events will NOT be reflected in the signals. This was because the fields are not necessarily the same value across all matches, so it was ambiguous (wildcards can occur in the queries, for example)... that functionality is now provided by the timeline (when you click 'investigate in timeline', the original events are pulled back and you can see everything that matched).

Additionally, the Preview functionality has never incorporated the grouping functionality, as far as I know... it just summarizes the matching events, not the threshold signals themselves... this is a bit confusing, I admit. And feedback on that is most certainly welcome.

Does this address your concerns, or are there additional problems? You mentioned that your rule is alerting on almost anything? You should get 1 signal per bucket/group every time the rule fires over a set of events that exceeds your threshold. If you're seeing extraneous signals, I'm happy to help you investigate.

Thanks!
Madi

@PhilA I may be wrong about the preview functionality... it looks like there may be a bug in parsing the form values that's preventing the bucketing code path from being hit when 'Preview' is selected. Investigating...

Hi Madi

Thank you for the reply and your help.

I think there is a bug with the preview functionality. The problems I was having when I created this post was with a threshold query. Yesterday I did some more testing, with a simple Custom Query detection. With this, the query was simple but the preview showed zero results. The same query in Discover showed results. Inspecting the query sent, there did seem to be some differences.

I definitely seem to have less detections firing than I am used to so something doesn't seem quite right. Those detections that do fire seem to have an issue with the timeline - I created some timeline templates on 7.10.2 that don't seem to work and the template fields I created don't populate.

I will try to run some more tests today, creating a basic rule for me logging in and see if I can reliably replicate the issue.

Thanks again for your help

Phil

1 Like

Hi Madi

I tried to create a test based on me logging in over VPN and as you'd expect, it worked perfectly!

Going back to one that does regularly seem to misbehave...

I have a Threshold detection which looks for denied connections through our FW and should generate an if a single source.ip generates 1000 failed connections in 5mins. The preview shows that there would be many hits. A point worth noting is that the 5min time window is not defined until later in the rule build (Step 3, schedule rule) - is this therefore not considered in this preview or is that used as it is the default value?

At the same time, I ran a data table visualisation, looking back over an hour, grouped by source.ip which effectively should give similar data. Note that only 2 IP's have >1000 events within the hour (not considering the rule is looking at a 5min wondow).

I do notice that when the Detection is active, it is not very noise so I believe it may actually be working in practice so just the preview has the issue.

Your explanation above however is useful and does explain why I had the problem in the first place and start investigating this. You note that fields queried in the original events will NOT be in the signal. I had created a Timeline Template which showed my all failed connections based on the source.ip that had generated the signal. That Timeline Template now fails because source.ip is not available. Without this I just get a load of events shown and they do not refer to the IP of interest unless I manually add it back in to the filter.

To view events of interest, I now need to view the signal. Find the IP which seems to be the field signal.threshold_result.value and then launch a timeline and then manually add the source.ip filter. Am I making this hard?

Thanks for your support

Phil

Thanks for the update, @PhilA! @yctercero is actively working on fixing the bug in Preview, and I believe we will have some fixes in 7.12 that will at least partially address your timeline concerns. The Timeline functionality for threshold rules is a little unreliable currently, but will be tightened up in the upcoming 7.12 release. You should be able to visualize all the events that made up the signal in Timeline out of the box (the source.ip matches should be loaded into your timeline automatically when visualized). We also have some fixes underway that will improve the signal viewing experience. I will endeavor to keep you updated on progress. Please let me know if there's anything else I can help with!

Thanks,
Madi

Thanks for the update @madi. I did go back to using the default timeline rather than my custom one but that doesn't seem to have worked and doesn't show me the source ip that caused the signal to fire. It sounds like there are changes coming soon so I will keep my eye out for 7.12 and see what happens with the upgrade.

Thanks for your support.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.