Omit indices when searching on multiple indices

A question and maybe a feature request. We have 100 indices for filebeat with this pattern:

filebeat-{CLUSTER_NAME}-{NAMESPACE}-{DATE}

.

Documents are:

{"@timestamp" : "..." , "cluster" : "prod", "namespace" : "kube-system", ... "message": "hello world"}

I am wondering, when we search against all filebeat indices, for a particular "time range" and "cluster", if elasticsearch is smart enough to select first potential good indices?

Hi @ebuildy

What version are you using?

The short answer is there are "Smarts" built into elasticsearch to "Prefetch/Limit" the applicable indices based on timestamps with respect to the time range of your search IF you are doing normal times series data ingestion with rollover/daily etc ILM etc and not reopening and writing to them etc.. etc..

Elastic will not know about that CLUSTER or NAMESPACE name in the index name and will not pre-filter that UNLESS you create a data view or something to limit the search upfront

So your answer is yes and no....

Others may have more details... but that is my top-level understanding

Very interesting,

I know this is the 1st optimisation step of some DBs: "dont open file / resource if you dont need it".

I am a big fan of Apache Spark and all data stuff (parquet, data lake etc...), they do something called "Partition pruning" to work only on good files, elasticsearch could implement this concept.

If I do the analogy with Parquet file format, elasticsearch could save for each index the min and max value for time fields, terms values for string fields (with a limit), and pre-filter indices before doing the search.


So as a good advice, this is better to group all documents by date indices (less indices but bigger): "filebeat-YYYY-MM-DD" than indices like (more indices but smaller) "filebeat-{PRODUCT}-YYYY-WEEK"

Thanks you,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.