Security Detection Rules Cause: `circuit_breaking_exception` on medium-ish deployments

Hi All,

I noticed that a few detection rules consume a lot of memory, and cause circuit_breaking_exception often in medium-ish deployments (~265 winlogbeat deployments).

Elasticsearch version: 7.14.1

Offending Rules:

  • Installation of Custom Shim Databases
  • Parent Process PID Spoofing
  • Potential Process Herpaderping Attempt

These rules all seem to use a good amount of memory to run, here is generally the exception I see:

An error occurred during rule execution: message: "circuit_breaking_exception: [circuit_breaking_exception] Reason: [eql_sequence] Data too large, data for [sequence_inflight] would be [3221924600/3gb], which is larger than the limit of [3221225472/3gb]" name: "Installation of Custom Shim Databases" id: "ac10bafe-a91f-11eb-a252-7f35a8822039" rule id: "c5ce48a6-7f57-4ee8-9313-3d0024caee10" signals index: ".siem-signals-security"

Has anyone else run into this issue with rules, if you did, how did you solve it?

Further context, the rules are executed across 3 coordinating only nodes in the cluster each with 8GB of RAM (6GB of Heap).

Hi @BenB196! I can't say for certain, but since all of those rules are EQL rules, I'm guessing that you may be missing fields that the EQL sequence is attempting to join upon. It's likely process.entity_id, but you'd have to verify that.

What EQL does in the case of a missing join field is to treat them as null and join on that value, which results in a lot of unexpected results/sequences, and causes the circuit breaking exception you're seeing.

While we're actively discussing a fix for this behavior, the near term solution would be to duplicate those rules and edit the sequence to either remove that join field, or substitute it for a more appropriate one for your data.

I hope that helps, cheers!

What is the output from the _cluster/stats?pretty&human API in Elasticsearch?

Here is the output of the requested command: { "_nodes" : { "total" : 30, "successful" : 30, "failed" : 0 - Pastebin.com

Note: I've upgraded the cluster to 7.15.1 since originally making this post.

Thanks for that. What size heaps do you run, it looks like ~6GB per node?

Yes, the coord nodes run ~6GB (they have 8GB total, and use the auto heap setting so whatever the Elasticsearch decides for their heap is what they get).