I have a bunch of past monthly indices that are read-only since no data is being indexed to them. I was thinking of applying best_compression on them and then invoke the _forcemerge api to reduce the num_segments to 1.
I can close those monthly indices --> apply best_compression --> open them again --> forcemerge them.
Is this the correct approach? Or that best_compression cannot be applied to existing indices?
Thank you. Yes agree on that. Will measure it. One question here: In case I opt to best_compression and find that the query times have increased, what option do I have?
Using best_compression primarily adds overhead at indexing time so should not affect query latencies at all. I also do not think you need to close the index.
No, as far as I know that will not change anything’s my. Be aware that forcemerging large shards can take a long time and result in a good amount of disk I/O.
Hi @Christian_Dahlqvist - would it be okay if I forcemerge multiple indices (say 3 to 5 indices) in parallel in a huge cluster of 50 TB (10 data nodes and 3 master nodes. Each data node is 8 TB with 16 cores and 56 GB RAM) or that the recommendation is to forcemerge only 1 index at a time?
It typically depends more on how fast your storage is than the amount of RAM or CPU as it results in a lot of disk I/O and used up disk space while in progress. Try and see how if affects your cluster and what limit you can tolerate.
Note that there is only one force-merge thread. If a force merge is already running and you request another then the later requests are placed in a queue and processed in turn.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.