Compressing and forcemerging past time-based indices

I have a bunch of past monthly indices that are read-only since no data is being indexed to them. I was thinking of applying best_compression on them and then invoke the _forcemerge api to reduce the num_segments to 1.

I can close those monthly indices --> apply best_compression --> open them again --> forcemerge them.

Is this the correct approach? Or that best_compression cannot be applied to existing indices?

Yep!

2 Likes

Thank you very much :+1: Will it have any performance impact on queries?

Better compression isn't free. It'll have some impact, but how much you will need to measure.

Understood. Thanks. I was hoping that with forcemerging to 1 segment, hopefully it may not have that much impact.

It shouldn't, but it's not something I think you will find anyone will guarantee.

Thank you. Yes agree on that. Will measure it. One question here: In case I opt to best_compression and find that the query times have increased, what option do I have?

You should be able to revert that with a settings change and another force merge.

Awesome. Thank you very much.

Using best_compression primarily adds overhead at indexing time so should not affect query latencies at all. I also do not think you need to close the index.

1 Like

Wow. That's great to hear. So just set the compression and forcemerge to 1 segment?

Yes.

Excellent. Thanks. By the way, if we have forcemerged before to 5 segments, we can again forcemerge to 1 segment. Correct?

No, as far as I know that will not change anything’s my. Be aware that forcemerging large shards can take a long time and result in a good amount of disk I/O.

1 Like

Noted. Thanks for all your inputs. Much appreciated.

@Christian_Dahlqvist and @warkolm - truly amazed at the support you guys are providing. Very much grateful.

1 Like

Hi @Christian_Dahlqvist - would it be okay if I forcemerge multiple indices (say 3 to 5 indices) in parallel in a huge cluster of 50 TB (10 data nodes and 3 master nodes. Each data node is 8 TB with 16 cores and 56 GB RAM) or that the recommendation is to forcemerge only 1 index at a time?

It typically depends more on how fast your storage is than the amount of RAM or CPU as it results in a lot of disk I/O and used up disk space while in progress. Try and see how if affects your cluster and what limit you can tolerate.

1 Like

Note that there is only one force-merge thread. If a force merge is already running and you request another then the later requests are placed in a queue and processed in turn.

2 Likes

Thanks Christian. The data nodes SSDs have max IOPS of 5000 with Max throughput being 200 MB/sec.