Compressing and forcemerging past time-based indices

sandeepkanabar · November 20, 2020, 4:39am

I have a bunch of past monthly indices that are read-only since no data is being indexed to them. I was thinking of applying best_compression on them and then invoke the _forcemerge api to reduce the num_segments to 1.

I can close those monthly indices --> apply best_compression --> open them again --> forcemerge them.

Is this the correct approach? Or that best_compression cannot be applied to existing indices?

warkolm · November 20, 2020, 4:49am

Yep!

sandeepkanabar · November 20, 2020, 5:31am

Thank you very much Will it have any performance impact on queries?

warkolm · November 20, 2020, 5:32am

Better compression isn't free. It'll have some impact, but how much you will need to measure.

sandeepkanabar · November 20, 2020, 5:34am

Understood. Thanks. I was hoping that with forcemerging to 1 segment, hopefully it may not have that much impact.

warkolm · November 20, 2020, 5:35am

It shouldn't, but it's not something I think you will find anyone will guarantee.

sandeepkanabar · November 20, 2020, 5:37am

Thank you. Yes agree on that. Will measure it. One question here: In case I opt to best_compression and find that the query times have increased, what option do I have?

warkolm · November 20, 2020, 5:39am

You should be able to revert that with a settings change and another force merge.

sandeepkanabar · November 20, 2020, 5:39am

Awesome. Thank you very much.

Christian_Dahlqvist · November 20, 2020, 6:08am

Using best_compression primarily adds overhead at indexing time so should not affect query latencies at all. I also do not think you need to close the index.

sandeepkanabar · November 20, 2020, 6:09am

Wow. That's great to hear. So just set the compression and forcemerge to 1 segment?

Christian_Dahlqvist · November 20, 2020, 6:10am

Yes.

sandeepkanabar · November 20, 2020, 6:10am

Excellent. Thanks. By the way, if we have forcemerged before to 5 segments, we can again forcemerge to 1 segment. Correct?

Christian_Dahlqvist · November 20, 2020, 6:11am

No, as far as I know that will not change anything’s my. Be aware that forcemerging large shards can take a long time and result in a good amount of disk I/O.

sandeepkanabar · November 20, 2020, 6:20am

Noted. Thanks for all your inputs. Much appreciated.

sandeepkanabar · November 20, 2020, 6:59am

@Christian_Dahlqvist and @warkolm - truly amazed at the support you guys are providing. Very much grateful.

sandeepkanabar · November 25, 2020, 4:11am

Hi @Christian_Dahlqvist - would it be okay if I forcemerge multiple indices (say 3 to 5 indices) in parallel in a huge cluster of 50 TB (10 data nodes and 3 master nodes. Each data node is 8 TB with 16 cores and 56 GB RAM) or that the recommendation is to forcemerge only 1 index at a time?

Christian_Dahlqvist · November 25, 2020, 7:21am

It typically depends more on how fast your storage is than the amount of RAM or CPU as it results in a lot of disk I/O and used up disk space while in progress. Try and see how if affects your cluster and what limit you can tolerate.

DavidTurner · November 25, 2020, 7:44am

Note that there is only one force-merge thread. If a force merge is already running and you request another then the later requests are placed in a queue and processed in turn.

sandeepkanabar · November 25, 2020, 1:52pm

Thanks Christian. The data nodes SSDs have max IOPS of 5000 with Max throughput being 200 MB/sec.

Topic		Replies	Views
Rerun ForceMerge When Already At 1 Segment per Shard? Elasticsearch	3	475	March 28, 2017
Applying best compression on already force-merged indices Elasticsearch	5	360	July 28, 2022
Optimizing to compress, already at 1 segment Elasticsearch	1	388	July 6, 2017
Force merge reduced performance Elasticsearch	2	1989	February 8, 2019
Aggressive index compression Elasticsearch	8	568	July 6, 2017

Compressing and forcemerging past time-based indices

Related topics