Elastic rollover performance thoughts

Hello,
in the past I used to have an index per month, and named them:
logs-YYYY-MM
when query them I used to build a list of indices, and concat them, with a date range filter.
for example. if the date range is last 1.5 months, the request will be:
GET logs-2020-04,logs-2020-05/_search..

when started to use ILM, I changed that to query an index pattern with a date range filter.
GET logs-*/_search

it leads me to think, that in case of a monthly index (30GB) with 1 replica,
even if the date filter would be: last 3 days, the query will scan (potentially) shard of a few years.

My question is

1.how can I prevent this scan:
elastic search supports pre_filter_shard_size parameter
but it seems that the pre_filter_shard_size will be active when shard number > 128, probably for performance reasons

  1. how meaningful is that for Lucene (scanning a shard that has 0 documents relevant to date filter)

  2. other thoughts?

1 Like

There's been a few improvements to Elasticsearch to handle this type of range query, and you don't need to worry about it reading all the data in all the shards, as it'll skip shards that don't contain relevant data.

1 Like

Thanks a lot! Anywhere I can read about it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.