Hi @jpountz Thanks a lot for your response.
Please let me know if my understanding is correct, increasing reclaim_deletes_weight will favor segments with more deleted docs from the documentation. For the existing segments that have already reached max_merged_segment , will they be affected by this setting also?
The reason why I am worried about the deleted documents , is that they are taking up lot of space and I have read these awesome blogs http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html and https://www.elastic.co/blog/lucenes-handling-of-deleted-documents by Micheal.
Here are some of the reasons why I am worried about the deleted docs.
We have lot of nested type fields in the elasticsearch mapping which is creating lot of documents for each primary doc we have . We currently have about 20 billion docs(about 3.5 TB with deleted docs, about 2.2 TB without deleted docs ) in the index with 20 primary shards and the data is growing at a rapid pace. I am worried about reaching the 2 billion docs per shard with lot of deleted docs.
In terms of capacity planning , from @mikemccand blogs , it feels like we have to account for 1.5X disk space . One thing that I have read from the blog is that Tiered Merge Policy does not reclaim deleted documents from segments that have already reached max_merged_segment size. So I was just wondering for my use case, if it is better to use other merge policies like LogByteSizeMergePolicy to reduce the overall deleted documents ? But if TieredMergePolicy is the way to go, then we will account for the 1.5X disk space in terms of infrastructure.
Please advice.