I have an ES 6.8.6 cluster containing about 1200
day-wise indices. Today indexing happens only on today's day-wise index. There are
no updates. This cluster contains about
45 TB data including preserving historical data up to last 7 years.
Each day-wise index is configured with
1 shard and 1 replica. The size of primary averages around
16 GB and with 1 replica, the total size of a day-wise index averages
Currently, to search past 18 months data, it hits
18 * 30 = 540 shards and even with powerful data nodes, it takes about
22 seconds to return response. There are very few fields in mapping.
Is the response time better if I search less number of bigger-sized shards versus large number of smaller-sized shards?
I was thinking to combine the past
16-GB-1-shard-day-wise indices into
480-GB-10-shards-monthly indices? E.g today is 6-Nov-2020 so 18 months before would be 6-May-2019. So, re-index the 30 day-wise indices for the month of May-2019 into a monthly index. Each monthly index would have 10 shards. Same for June-2019 all the way upto Oct-2020. For the month of Nov-2020, I can keep it day-wise and once Nov-2020 is over, re-index them into monthly index.
With approach, the last 18 months data search would hit at the max - say on 30th Nov - `(17 months * 10 shards) + (1 current month * 30 shards) = 200 shards. For 6th Nov, it would hit 176 shards.
If this approach works, then I can look to replace day-wise indices with monthly indices altogether.
Take a look at using ILM to manage this, rolling at a larger shard size than what you have (eg 50GB).
I would not be surprised if it was faster but it may depend on the data and queries. I would recommend reindexing one month and compare the latency when querying the monthly index compared to the daily ones. If it is better or close I suspect switching to monthly indices could clearly be worthwhile.
Thanks @Christian_Dahlqvist. That makes sense. And I suppose having a monthly index is much better than force-merging all day-wise indices? All
past day-wise indices are
read-only by default since no updates happen.
Whether to forcemerge or not us a separate issue. It might be a good idea even for monthly indices.
Understand. I was just wondering if instead of re-indexing daily to monthly, I can forcemerge daily indices and improve response. I agree that forcemerging can also be performed on read-only monthly indices as well.
@Christian_Dahlqvist - I reindexed 6-months daily indices into 6 monthly indices and was able to reduce the response time to
less than 5 seconds even with 0 replicas. Thanks a bunch
One more question: Planning to
forcemerge the monthly indices now since they are read-only. The average size of a monthly index is 550 GB and I chose 12 shards so that each shard is within the limit of 50 GB.
I am thinking to set
max_num_segements = 5 for forcemerge. Is that okay or I should set it to
If I set to
1, then each segment is likely to be a whopping
50GB in size. Even with
2 segments, per segment it will be
25 GB which too seems higher. So thought to go with
5 which will mean each segment will be approx
Any recommendation here please?
Going to 1 segment should reduce heap usage and improve search speed, so that is what I would recommend.
Thanks Chritian. But wouldn't it matter that with 1 segment, each segment would be super large of 50-60GB?
Sounds fine to me. 50GB isn't "super-large" at all.
Thanks David. Glad to hear 50 GB isn't "super large"
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.