A few questions around the can_match feature

Hi, @jpountz and @ruflin. I working on a migration path to migrate our on-prem environment from xxxbeat-* type indices to datastreams based on the elastic Agent. This will require updating/recreating all sorts of objects in Kibana (visualizations, saved searches, Security rules with EQL queries, ....) as well as queries in python scripts. I this stage I am still experimenting in a lab environment and am trying to understand all about the new indexing strategy in order to come up with the most appropriate migration strategy. I have watched a few a few youtube videos from the Elastic community about the new indexing strategy including Deep dive into the new Elastic Indexing Strategy - YouTube and have a basic understaning of how it works.

For Optimization, we would like to take advantage of constant_keyword and the can_match feature as much as possible. For the sake of this conversation, let's assume I am working with the base index patern of logs-* and am trying work with the dataset system.security.

From this privious topic, How does can_match functionality work, I understand that for visualization, to use the can_match feature, I will have to include something like data_stream.dataset: "system.security" in the visualization filter. But I have a few more questions to make sure I fully understand the concept and how to take advantage of it.

  1. Does the can_match feature only work when the a condition like data_stream.dataset: "system.security" is specified as a filter? or does it also work when specified in a query/query_string?
    ex: kibana discover search 'data_stream.dataset: "system.security" and winlog.event_id: "4624"' with nothing in the discover filter.
  2. How would I take advantage of this in an EQL query? something like any where (data_stream.dataset: "system.security" and winlog.event_id: "4624")?
  3. My 3rd question has to do with python queries but I think the answer to question 1 might answer it.
  1. If you pass datastream.dataset:"system.security" in a query_string, Elasticsearch won't be able to skip a shard using the can_match phase, but shard requests will still be pretty cheap as Elasticsearch will easily notice that the query cannot possibly match any docs once the query string gets parsed (assuming that the shard has a different value for datastream.dataset).
  2. EQL uses the query DSL and the _search API under the hood, so this will work transparently.

Hi @jpountz sorry for the late reply. I have been off work training for a cert. Thanks for your reply.

For Kibana KQL, which would be more optimised for datastreams?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.