Rule Optimization

Hello, I have a general question about designing optimal rules.
To my understanding there are three ways of constructing exclusions/filtering :

  1. Directly in the query with "NOT" statements
  2. Add as a filter
  3. Rule exceptions

With regards to performance, which one is the most lightweight/optimal, that consumes the least amount of resources?

Furthermore, if a KQL has a "NOT" statement in it, does the order matter? Is it in any way beneficial to state the "NOT" statements at the very beginning of a query rather than at the end?

In addition, which language is to prefer with regards to performance?
KQL, DSL, ESQL, Lucene?

Finally, are there any articles/posts/documentation you recommend me reading?

Hi @SebJen,

Lots of good questions here. In general, different ways of representing the same logic in a rule will have equivalent performance so it comes down to which representation is most convenient for managing the logic. Under the hood, the rule query, filters, and most exceptions are combined into a single DSL query that we send to Elasticsearch. Elasticsearch does query rewriting and optimization as part of the execution process, so I wouldn't expect simple transformations like reordering statements to have much effect.

Exceptions do have slower performance if they reference "large" value lists - see Create and manage value lists | Elastic Security Solution [8.17] | Elastic. If the value list referenced by an exception is small, we are able to fetch the values and include them in our initial query to Elasticsearch, but for large value lists we have to make multiple queries instead which can have a significant impact. This is still handy if you have, for example, 100,000 IP addresses that are known-good and want to exclude them from your rules.

Regarding languages to use, KQL, DSL, Lucene, and ESQL should all be equivalent performance-wise. The main difference between KQL, DSL, and Lucene is convenience. ESQL provides significant new aggregation and computation capabilities over the other languages and uses a new query engine.

Some interesting blog posts with more info on ESQL performance:

The benchmarks may be interesting as well:
https://elasticsearch-benchmarks.elastic.co/#tracks/esql/nightly/default/30d

Note that the benchmarks where ESQL is outperforming DSL are aggregations, not just simple search queries. I wouldn't expect much benefit from re-implementing any of your existing Query rules into ESQL.

The #1 recommendation I have for detection rule performance is ensure that frozen data is not queried by rules. Usually the timestamp on source docs is sufficient for ES to efficiently exclude frozen data, but if the timestamps on incoming docs are not correct then ES can end up querying frozen data repeatedly. See Configure advanced settings | Elastic Security Solution [8.17] | Elastic for a setting to explicitly exclude frozen data from rules.

When writing queries, the most common pitfall I see is using wildcards too much. Wildcards are much more expensive to evaluate than a regular query, and the position of a wildcard within a string also has an impact on query cost.

Finally, EQL sequences can be tricky with performance. One way performance can be poor with sequences is if the first event in the sequence is very common and later events are much rarer - in that scenario, Elasticsearch must process many candidate sequences. Alternatively, if the first event in the sequence is rare, it starts with a much smaller set of candidates so performance is much better.