Would the search performance be very slow if a data stream consists of too many or too large cold backing indices?

Morriaty · January 14, 2022, 7:46am

Hi, I'm moving from es 6.x to 7.x, find the new feature data stream and would like to have a try.

But I'm a bit worried of the search performace for the docs explains When you submit a read request to a data stream, the stream routes the request to all its backing indices. doc source

Say we have a data stream alias as audit_log, consisting of hundreds of backing indices:

.ds.audit_log_2022.01_000001 HOT PHASE
.ds.audit_log_2021.12_000002 COLD PHASE
.ds.audit_log_2021.12_000001 COLD PHASE
.....
.ds.audit_log_2021.01_000100 COLD PHASE

If we do search requests:

GET audit_log/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-1M/d"
          }
        }
      },
      "must": [
        {
          "match": {
            "auditor": "tom"
          }
        }
      ]
    }
  }
}

Can this search request only route to the several backing indices cause of the @timestamp range clause?
If the above answer is NO. Is there any other alternative features could help us to route to indices according to date fileds by default.

Morriaty · January 17, 2022, 4:17am

Found a question alike: Elasticsearch search query on hot and warm nodes - Stack Overflow

Regarding to data streams, the problems turn to be:

set aliases ds_search_recent and ds_search_all to the data stream
remove alias ds_search_recent when entering into warm(cold) phases

But currently seems there is no automatic way to do these

Morriaty · January 17, 2022, 6:59am

A related github issue Add ILM action to add/remove aliases · Issue #47881 · elastic/elasticsearch · GitHub

Bad news it is still open since 2019

Morriaty · January 17, 2022, 7:41am

Another related PR Use @timestamp field to route documents to a backing index of a data stream by martijnvg · Pull Request #82079 · elastic/elasticsearch · GitHub

It seems to be a new feature of es 8.1, which is not released yet.

DavidTurner · January 17, 2022, 10:13am

Can this search request only route to the several backing indices cause of the @timestamp range clause ?

Technically no, but in practice ES achieves this goal anyway: it routes the search to every backing index but the ones that don't match the timestamp range get optimised into a MatchNoDocsQuery which obviously hits no data and takes no time to execute.

Morriaty · January 18, 2022, 6:54am

Thank you David. I am still curious that is this a feature of backing indices or of filter cache?

I mean would it be slow at the first search query and slow when not hit time span filter cache? For that common time series search cases always changed time span frequently.

DavidTurner · January 18, 2022, 8:16am

This happens when rewriting the query into its optimised form, long before the filter cache gets involved. It involves comparing two long timestamp values, which is almost instantaneous and doesn't involve any caching.

system · February 15, 2022, 8:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Migrate to datastreams withou aliases + datastreams filter Elasticsearch datastreams	4	347	February 16, 2023
When using Index Life Cycle Management, how to limit queries to hot indexes Elasticsearch	6	535	November 7, 2019
Data stream with hot/warm/cold tiers read performance Elasticsearch datastreams	3	613	July 12, 2022
Datastreams are slower than normal indices? Elasticsearch docker	6	26	February 20, 2025
Data Stream Backing Index Naming Elasticsearch	1	259	April 13, 2022

Would the search performance be very slow if a data stream consists of too many or too large cold backing indices?

Related topics