Compressing and forcemerging past time-based indices

sandeepkanabar · November 25, 2020, 6:00pm

Thanks @DavidTurner. You mean to say one force-merge thread for an entire cluster? Or per shard?

DavidTurner · November 25, 2020, 8:24pm

That's a very good question, sorry I should have been more precise. It's one thread per node. So it's worth starting as many force-merges as you can as they'll run in sequence on each node, but in parallel across the whole cluster.

sandeepkanabar · November 25, 2020, 9:11pm

Excellent. Thanks David for the clarification. Yeah I would usually run forcemerge using curator tool that is invoked via cronjob on each data node.

sandeepkanabar · December 9, 2020, 4:44pm

Hi @Christian_Dahlqvist and @warkolm

Just wanted to update you about some amazing performance numbers I ended up getting due to applying best_compression on past Read-only monthly indices and later force-merging them.

Of the past 18 monthly indices that I compressed and force-merged, the minimum size reduction I achieved was a staggering 42.5% (an index of 515 GB came down to 297 GB) with the maximum being 45% (from 645 GB to 354 GB) . This has almost halved my storage requirements Heartfelt thanks !

However, I have one question. When the monthly index was 645 GB initially (the monthly index was created by re-indexing day-wise indices into monthly to improve query performance), I had opted for 12 shards so that each shard will have around ~40-55 GB in size. But now with best_compression, the size is almost down by 45% to 354 GB . My question is: should I reduce the number of shards from 12 to 6? Considering that these are already force merged, will shrinking from 12 to 6 help? Or that after shrinking, I've to again forcemerge?

sandeepkanabar · December 10, 2020, 7:18pm

Hey @Christian_Dahlqvist / @warkolm / @DavidTurner - will really appreciate some insights on this.

Apologies for multiple tags.

DavidTurner · December 10, 2020, 7:51pm

Tricky to say. 20-30GB isn't unreasonably small for shards so IMO you could just leave them at 12; OTOH there is usually some data duplicated across shards (e.g. the terms dictionary) so shrinking might save you some more space. You don't have to force-merge anything, and again it's tricky to say whether it'll improve things further. I don't think you'll find another 40% of space savings but it depends on the details of your data.

Sorry there's no definitive answer here, I don't think we can offer more guidance than to try it and see. You've certainly got space to experiment now

sandeepkanabar · December 10, 2020, 8:09pm

Thanks a ton, David. Totally understand. That helps a lot. I was just looking for some guidance that is it worth trying or not. From your inputs, it seems it would be worthwhile to try shrinking and see. I agree that 20-30 GB isn't too small for shards and that there may not be much impact by shrinking. But the only reason I thought to shrink was - when searching past 18 months data (current month index + past 18 monthly indices), the query currently hits 19 indices * 12 shards = 228 shards. I thought that if I reduce to 6, the query will hit just 114 shards. That was my reasoning.

Of course, not expecting 40% space savings again. More than happy with the gains I've obtained

So I suppose, I can run curator to shrink shards from 12 to 6. And since the data is already force merged, I reckon each index will end up with 6 segments for primary instead of current 12.

Thanks for all your inputs.

DavidTurner · December 10, 2020, 8:27pm

That makes sense; searching fewer larger shards can be more efficient (because of how the data structures scale) but it can also be less efficient (e.g. uses fewer parallel threads). It all depends...

Shrinking doesn't necessarily adjust the segment count. If you currently have twelve 20GB shards each with a single segment then shrinking by a factor of two will almost certainly leave you with six 40GB shards each with two segments.

sandeepkanabar · December 10, 2020, 8:32pm

And that means, after shrinking, it will need to be force-merged if I want to reduce the segments to 1?

Got it. In that case, I think I'm better off with what I've now

system · January 7, 2021, 8:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Applying best compression on already force-merged indices Elasticsearch	5	331	July 28, 2022
Massive index compression Elasticsearch ilm-index-lifecycle-management	11	1509	March 11, 2024
"best_compression" not compressing the data Elasticsearch	3	1205	December 5, 2022
Force merge reduced performance Elasticsearch	2	1966	February 8, 2019
Warm indices is not compressed Elasticsearch ilm-index-lifecycle-management	19	1142	December 2, 2021

Compressing and forcemerging past time-based indices

Related topics