Security rules failing (timed out) all the time

I have a testing Elasticsearch (7.14.0 before and 7.15.0 now) where I'm sending filebeat's Threat Intel data (50,000 documents on the filebeat-* index) and a Firewall data index with less than 1,000,000 messages.
I've configured a simple Security rule that matches source.ip from the firewall with threatintel.indicator.ip. Every time it runs it gives me this error:

An error occurred during rule execution: message: "Request timed out" name: "ip_maliciosas" id: "501312b0-222e-11ec-bb83-bff19fd32bc0" rule id: "fb7b64fe-fcb2-4f0f-8b5f-d881811ee01a" signals index: ".siem-signals-default"

This cluster is a stand-alone server with 64GB in RAM, 16CPU and a lot of HD. There is nothing else in the cluster besides this two indices.
I don't get why is this timed out error.
What whould I do? make bigger the timeout? in Kibana? In Elasticsearch? If I have this problem with this very simple rule.. I've to forget about having any more complex one?
Thanks!

Change the default search time in the threatintel rule. I'm full SSD and struggle with defaults. Changing to 30 minutes is good going to 1 hour is better. Considering this is a passive setup not an active HIDS/IPS a few minutes extra wont hurt to much. If you really want to fine time check the rule monitoring tab and watch it's execution time then convert it into min's adding 5 on time for wiggle room.

Also look at setting the translog flush rate to 1Gb vs the default that will pool a ton writes into a larger cache which with HDD help a ton. Set your filebeat index refresh to 30 seconds or to a 1 min and you'll be golden.

Try dropping the extra index it searches Endgame for example if your not on Platinum license and packetbeat. While it doesn't help to much every ms counts.

Defaults work but far from ideal as soon as any work load hits it. You would have to define a lot of HD's 8 for example wont be able to take the workload going to 30 your closer to the IO requirements for near instance response. This is purely from what I've seen in my environment I don't know yours so take that with a grain of salt.

Thanks for your answer.
Since the day I originally posted this and now I've made some changes. For example, now I've set the rule to run every 60 minutes (plus 1 minute). After that it works 3 or 4 times a day (so, around 20 fails per day).

Regarding the rule monitoring, it's hard to do it, because most of them fail to run, so there is no information in the monitoring tab. I'll check it when a successful run occurs.

I've setup ILM to delete the data after 3 hours in order to maintain that index really small (normally is between 500MB and 1GB, 10-20 million docs). filebeat's index is around 160MB and 120.000 docs. my firewall's index's flush time is around 6 seconds ( I know running curl to localhost:9200/firewall_index/_stats/flush), so I don't know if the translog flush rate would help much. so far, I know to change it manually with this:

curl -X PUT -u elastic 'localhost:9200/firewall_index/_settings?pretty' -H 'Content-Type: application/json' -d '{"index":{"translog.flush_threshold_size" : "1024MB"}}'

If I wnat to set it permanently, do I have to add it in Elasticsearch.yml? Something like:

index.translog.flush_threshold_size: '1G'

Something like that?
When you say me to set my filebeat index refresh to 30 seconds or to a 1 min, you mean in the filebeat template, in index settings, change this:

"refresh_interval": "5s",

into this:

"refresh_interval": "60s",

Lastly, I didn't get the part related to "dropping the extra index". what do mean by that one?

With tiny index size your spot on, flush size wont help at all.

Correct on the refresh interval to a little longer. Try in increments of 10s until the cluster isn't struggling.


Example of what I mean to remove them. It does not work well with custom indexes that I have tried. The field need to be named the same as the default index along with the type "numeric, text".

That is a really short time for the ILM limited space or just not needed for long period?
What is the settings in your ThreatIntel.yml for filebeat look like?

This cluster is only for alerts. That's why I have it configured with such a short ILM policy.
Here is my threatintel.yml:

- module: threatintel
  abuseurl:
    enabled: true
    var.input: httpjson
    var.url: https://urlhaus-api.abuse.ch/v1/urls/recent/
    var.interval: 10m
  abusemalware:
    enabled: true
    var.input: httpjson
    var.url: https://urlhaus-api.abuse.ch/v1/payloads/recent/
    var.interval: 10m
  malwarebazaar:
    enabled: true
    var.input: httpjson
    var.url: https://mb-api.abuse.ch/api/v1/
    var.interval: 10m
  misp:
    enabled: false
    var.input: httpjson
    var.url: https://SERVER/events/restSearch
    var.api_token: API_KEY
    var.first_interval: 300h
    var.interval: 5m
  otx:
    enabled: true
    var.input: httpjson
    var.url: https://otx.alienvault.com/api/v1/indicators/export
    var.api_token: 5c2e9xxxxxxxxxxxxxxxxxxxxxxxxxxxx803d3af402b
    var.lookback_range: 24h
    var.first_interval: 400h
    var.interval: 5m
  anomali:
    enabled: true
    var.input: httpjson
    var.url: https://limo.anomali.com/api/v1/taxii2/feeds/collections/107/objects
    var.username: guest
    var.password: guest
    var.first_interval: 400h
    var.interval: 5m
  anomali:
    enabled: true
    var.input: httpjson
    var.url: https://limo.anomali.com/api/v1/taxii2/feeds/collections/135/objects
    var.username: guest
    var.password: guest
    var.first_interval: 400h
    var.interval: 5m
[...] (a lot of different anomali configurations)

"firewall index" -- One of the dev's said that custom indexes weren't supported. If your running Palo Alto, Cisco, Fortinet you can feed it to Filebeat with the native modules. Fair warning the Fortinet one with 6.4+ works only on default policy names and nothing custom it seems.

The threatintel looks good but to be nice to the feeds changing to 30 minutes would be better. Very little changes occur and if your rule is set to run every 60 minutes you wont see anything new. The anomali feed is temperamental try disabling them to get everything to start then add in 1 by 1.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.