I recently reindexed where I had to update docs multiple times and deleted huge amount of docs.
I see deleted and updated docs are still occupying disk space. From my research so far, force merge of segments is not a good idea since it could create more segments with huge size and could backfire. What can be the safest way to get rid of those unwanted space? Or should I just leave it alone, and over the time it will go away on its own?
Following is the screenshot of my index that has 1 replica. You can see the highlighted part. It is way more
It's not clear what those numbers actually represent, and I'm not familiar with the UI you've taken a screenshot of there. Is it possible that the numbers in brackets are the total of primary plus replica, and the numbers outside the brackets are just the primaries? I ask because 5.51TB is about twice 2.79TB, and also because there's no way to measure the size of documents-that-have-not-been-deleted so that can't be what that 2.79TB is reporting.
But yes, as a rule, you should leave merging alone and let Lucene do its thing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.