Elastic rollover performance thoughts

liorg2 · May 20, 2020, 12:36pm

Hello,
in the past I used to have an index per month, and named them:
logs-YYYY-MM
when query them I used to build a list of indices, and concat them, with a date range filter.
for example. if the date range is last 1.5 months, the request will be:
GET logs-2020-04,logs-2020-05/_search..

when started to use ILM, I changed that to query an index pattern with a date range filter.
GET logs-*/_search

it leads me to think, that in case of a monthly index (30GB) with 1 replica,
even if the date filter would be: last 3 days, the query will scan (potentially) shard of a few years.

My question is

1.how can I prevent this scan:
elastic search supports pre_filter_shard_size parameter
but it seems that the pre_filter_shard_size will be active when shard number > 128, probably for performance reasons

how meaningful is that for Lucene (scanning a shard that has 0 documents relevant to date filter)
other thoughts?

warkolm · May 20, 2020, 11:01pm

There's been a few improvements to Elasticsearch to handle this type of range query, and you don't need to worry about it reading all the data in all the shards, as it'll skip shards that don't contain relevant data.

liorg2 · May 21, 2020, 4:28am

Thanks a lot! Anywhere I can read about it?

system · June 18, 2020, 4:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Daily index and monthly index query performance difference Elasticsearch	3	1003	December 22, 2020
What is better. Monthly Indices or 1 Index with more shards? Elasticsearch	5	1132	October 17, 2020
Index houskeeping (ILM) Elasticsearch rollups	6	421	March 7, 2022
Roller over at mid night with Index per 30days in ILM Elasticsearch ilm-index-lifecycle-management	5	454	March 8, 2023
How to manage rolling indexes with non-static data Elasticsearch	2	468	March 10, 2017

Elastic rollover performance thoughts

Related topics