Detection rules hitting all data tiers

Hi,

We are using elasticsearch detection engine to run detections through logs data. Most of our detection rules are executed against logs-* index pattern. These rules run every 5 minutes with a look back of 4 minutes.

What we are seeing is queries running against our cold and frozen tier from these detection rules. Although all current data should be under hot tier. Has anyone else seen this? Is there a way to limit the timerange to limit to only hot tier/last 10 minutes of data?

Hello there,

Having the rule configured with those settings, the rule runs every 5 minutes but analyzes the documents added to indices during the last 9 minutes only.

  1. Can you validate the indices included in the rule's index pattern are all in the hot nodes?
  2. Can you share a screenshot of the rule definition to check the content?

Regards

I am setting these roles as per Create a detection rule | Elastic Security Solution [8.3] | Elastic.

The index pattern is logs-endpoint.events.*.

the ILM policies moves these indices to warm upon rollover, to cold after 15 days and to frozen tier after 30 days. the 10 minute data should always be in hot or warm tier.

example rule:

Thanks,

The configuration seems right, should be querying data older than 9 minutes.

Got 2 questions:

  • How are you validating that this specific rule is causing queries to the other tiers? Couldn't be another rule with bad timing, or a watcher... running some bad queries?

  • If you try to run the same EQL Query to the same index (change the Data Source Index pattern the same one used in the rule) from within the Timeline investigation tool ==> Correlation TAB,
    Can you validate if the queries hits other tiers?

Going to jump in on this conversation to add a few pieces of information:

  1. I do believe that the query "hits" older indices, but I'm not 100% sure if it is actually querying them, just checking the date range filter against them, or doing something else entirely.
    • The way I can confirm this as happening is looking at the Elasticsearch audit logs. (Note: I don't know which SIEM rule was in your screenshot, so I picked Startup/Logon Script added to Group Policy Object from the prebuilt rules (8.3.2)). If we look at the rule defined, it should only look-back 1 minute, but if I look at the audit logs for Elasticsearch, filtering for: elasticsearch.audit.apikey.name: "Alerting: siem.queryRule/Startup/Logon Script added to Group Policy Object", and then look at the field elasticsearch.audit.indices, I can see a large list of indices with an example index being restored-.ds-logs-system.application-dev-2022.03.10-000008 which is in my cluster's cold tier(, and not an index I would expect data from 1 minute ago to be in).
  2. I think there is a gap on the Elasticsearch side, where there isn't a real way to profile EQL queries, so trying to identify what the query is actually doing here is pretty challenging (unless you know EQL under the hood?)
  3. Kibana SIEM rules also have a gap here I think, where you can't actually see what query it's actually running. While you can see the EQL part, and you could probably assume that the Look-back field is just applying a range filter, I don't think you can say this for sure, since you can't really see the query that Kibana is compiling/executing.

For reference, I have notice this occasionally in the past where warm/cold node searches would tick up on the addition of new SIEM rules, I generally just ignored it as it never caused myself much of a problem.

1 Like

So load on our cluster(all nodes) went up when we upgraded from 8.2.3 to 8.3.2. Load was promarily because of search threads. While working with support, they identified few queries causing the load the be high. We also ended up disabling few detection rules.

The load dropped as soon as we upgraded from 8.3.2 to 8.3.3. Not sure why that would be the case. I went through the changelog and didn't find anything significant.