Coordinating Nodes High Circuit Breaker Tripped Counts

Thanks again for the additional information! We'll take a look at those potential optimizations. I think the sample approach for Account Password Reset Remotely looks promising. In [Rule Tuning] Potential Privilege Escalation via PKEXEC, the original EQL query is treating the * characters as wildcards whereas the modified proposal to use match on file.path.text treats * as a literal * which may account for a portion of the performance difference but also changes the result set. I don't know if the changes to the result set would be a problem - the rule author team would know better and will engage on the Github issue.

Thanks for calling this out, I had completely missed testing leading/trailing characters are part of the suggestion. I've replied to that issue with some additional context for the rule authors.

If you duplicate these rules that consistently time out and edit the duplicates by selecting the Do not use @timestamp as a fallback timestamp field option, that may significantly improve the performance when there are future timestamps in frozen tier indices. (In section I, "Timestamp override" docs).

I went ahead and tested this out on, Abnormal Process ID or Lock File Created and Cron Job Created or Changed by Previously Unknown Process, and these now seem to complete in ~20-30 seconds rather than timing out after 2 minutes, which is definitely better.

Somewhat of a related question, has it been considered to just exclude the cold/frozen tiers from detection rules? I tested this method on one of the rules as well:

And got similar performance to disabling fallback, but with advantage of keeping fallback enabled.

Also, we're actively working on making it easier to customize these prebuilt rules without having to duplicate them first.

This sounds nice, not having to duplicate rules to make minor adjustments would make things significantly easier to maintain in some areas.