How can I easily compress 140 indices (some of which are up to 100 GB)?
I have tried
closing the indices,
changing the codec to 'best_compression,'
and then executing a forcemerge on the index.
The forcemerge task finishes very quickly, and the index does not change its size.
However, if I manually reindex, the index is compressed by up to 60%.
Is there a way to do this in bulk? I don't mind having to do it with a Python script. Thank you!
"Note: I made a script to reindex one by one, but some reindexing takes more than 10 hours, and it's complex to sequence with so much delay between reindexations. I've also tried parallelizing several reindexations at once. But what I would like is to let this parallelization and sequencing be done and managed by the cluster itself."
Please don't ping folks to draw them into a conversation like that, especially not after just a few minutes. It's very rude and violates the community code of conduct. We're all just volunteers here.
Forcemerging should be quicker than reindexing so I would recommend doing what you described. The only reason the forcemerge task would finish very quickly is if the index has already been forcemerged down to a single segment. If you are doing this as part of an index lifecycle policy I would recommend changing the codec there as well.
To force a forcemerge I think you need to increase the number of segments of the shards. You can do this by indexing a dummy document with a known ID and then immediately delete it. If your indices have more than 1 primary shard you may need to index and delete multiple documents so you know all shards have more than 1 segment.
Thks @Christian_Dahlqvist ! The thing is this indices are on the last stage of the ILM. So the forcemerge is done. So because of that even i tried to forcemerge nothing happpens.
What can i do to apply this massively? Can i apply like reindex in batch mode?
Any reindexing you will need to manage yourself, e.g. using a script. Why not add and delete a document to the index and then perform another forcemerge down to a single segment as I suggested instead of reindexing?
Yes, that should work. If you do not want the additional document to pollute the index you can also remove it before running the forcemerge. Not sure whether you may also run a refresh or not.
Now i have my 140 indexes on force merge queue ! But the thing is i have 3 servers with enough resources to made more than one force_merge at the same time.
I tried to increase the thread_pool of force_merge
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.