How does can_match functionality work

seanziee · November 1, 2021, 5:03pm

Just finished watching the video(@ruflin ):Deep dive into the new Elastic Indexing Strategy - YouTube and it's mentioned that because of the can_match functionality, indices that don't have relevant documents won't be queried. Great! However, how exactly does this work? I'm looking to set up my cluster with time series metricbeat data and I want to know if my indices will take advantage of this; my current architecture does not use data streams and I'm trying to decide if I should change.

I understand that with a range query, data from indices that don't have any data within that range should not get queried at all. (is that automatic?)

What about field searching? Does it only work for fields that I define as constant_keyword or if I search for a field and an index doesn't have that field, automatically it will be ignored? Is that the primary reason why the default separating by metricset for data streams makes sense because the default metricbeat template has all mappings defined which means they will all get searched? I want to see if I can have this can_match functionality without changing my current architecture to data streams. Thanks!

ruflin · November 2, 2021, 10:52am

I will need @jpountz for the deep technical details but in general the constant_keywords are the secret sauce. So even if you don't use data streams but use constant_keyword fields to query on, you should still get the benefits. Looking at it different, just having data streams will not bring you the benefit either as long as you don't use the constant_keyword parts.

As long as all data from Metricbeat goes to a single index or data stream, you will not really get the benefits, that is why we split it up to be able to "prefilter" on what data you want (type, dataset, namespace) and made it a constant_keyword.

seanziee · November 2, 2021, 12:06pm

Thanks for the response.

So to confirm, the way that I can get the search speed enhancement is by creating a separate index for each dataset, setting the field dataset as a constant_keyword, and then when I create a dashboard, I will be sure that each visualization is filtering on the correct dataset it needs or do I even need to define filters for each visual? After doing this, I should expect fast response times?

As an aside, is there anyway to get access to the dashboards, templates and mappings created by each type of data stream without creating a dummy system and manually exporting and importing?

ruflin · November 3, 2021, 10:34am

You assumption is correct. The part I wonder: Why not jump directly on the data stream naming scheme?

For your second question: With Elastic Agent and Fleet all the assets are delivered as packages or shown as integrations in the UI. This is our mechanism to bundle and ship assets. You could use it in theory today for your use case but its not fully ready yet. Hope we can offer this tooling to you somewhere in the future. Have a look at these two repos:

seanziee · November 3, 2021, 10:48am

I'm asking the questions to get a clearer idea of how it works in the background; I may jump directly to the default data stream route.

To confirm, do I need to filter by dataset for each visual in order to get the improved response times or that will be taken care of automatically? Can I use the same index pattern for all of the different datasets and still get the improved response times?

seanziee · November 3, 2021, 12:24pm

Looking at the integrations link you sent me, it seems like many of the default dashboards use visuals that use the metrics-* index pattern (I was imagining that there would be an index pattern for each dataset)

ruflin · November 3, 2021, 1:59pm

To get the benefit, you have to apply to filter on each visualisation as you mentioned. An other way to think of it, each query needs it. And then you get all the benefits automatically.

There is indeed a global Kibana index pattern for logs for example. But as the query in the visualisation queries directly on the constant keywords first, you get the benefits. The global logs-* index pattern is more for users which build their own visualisations to have it available.

Taking the nginx example, it would likely be even a bit more efficient to have an index pattern called logs-nginx.access-* which is directly part of the visualisation to prefilter already the indices before running a query. If we would go down this path, the user would see a LOT of the index patterns in the discovery drop down. A solution we are discussion with the Kibana team is that we can "embed" index patterns directly into the visualisations one day.

Hope the above helps to share some more background info. Happy to dive deeper if needed.

seanziee · November 3, 2021, 2:21pm

Yeah, there would be WAY too many index patterns for my use case, I think I'll stick to filtering per visual, though being able to embed index patterns is a cool thought.

Thanks so much for your help!

system · December 1, 2021, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A few questions around the can_match feature Elasticsearch	3	725	April 14, 2022
🐌 Poor search performance when searching in data stream indexes Beats metricbeat	5	335	November 8, 2022
Query response times Elasticsearch	5	368	July 6, 2017
Kibana4 - dynamic index selection Kibana	5	1013	July 6, 2017
Problem in filtering data based on individual fields Elasticsearch	5	366	July 6, 2017

How does can_match functionality work

Related topics