Tuning merge policy for large number of updates

Hi All,
I am using ES 1.7.3. In my cluster I have a large number of updates daily. Ever since we switched from ES 1.3.9 to ES 1.7.3, we are starting to notice about 150% more disk usage as these updates happen. I looked at the indices and segments, and it turned out that additional disk usage is due to number of deleted ( ie updated) documents. If I use _optimize?only_expunge_deletes=true, I am able to reclaim disk space. but ES does not seem to reclaim disk space when merges occur...

Here are our merge policy settings

        "index.merge.policy.reclaim_deletes_weight": "6.0"
        "index.merge.policy.max_merged_segment": "1gb"
        "index.store.throttle.type": "merge"     
    "index.store.throttle.max_bytes_per_sec": "200mb"

With ES 1.3.9, we used to have

    "index.merge.policy.reclaim_deletes_weight": "2.0"
    "index.merge.policy.max_merged_segment": "5gb"
    "index.store.throttle.type": "none"     
"index.store.throttle.max_bytes_per_sec": "20mb"

This worked perfectly fine with no disk growth issues.

With ES 1.3.9 disk usage was not a problem, despite large number of updates happening, but with 1.7.3 I see disk usage creep up alarmingly.

I do have multi-gigabyte shards due to large data volume. This cluster also has heavy query rate ( about 4-5 thousand queries per second ). I use SSD backed machines, and haven't seen disk I/O or memory bottlenecks. Its only the disk usage that is the problem. How do I tune my merge policy to keep up with updates? I am ok if indexing is a little slow due to additional merges, but I cannot keep up with this kind of disk usage. Please advice!

Thanks,
Madhav.

Could you have a look at _cat segments and see if the biggest segments are the ones with all the deleted documents?

Hi Nik,
Yes, this is what it looks like -

&part-largecatalog0      5 r 10.101.54.105 _1iqg  70936 2738676 346368    3.7gb 10107002 true true 4.10.4 true
&part-largecatalog0      5 p 10.101.56.193 _1n2o  76560 2737806 345876    3.7gb 10106554 true true 4.10.4 true
&part-largecatalog0      2 r 10.101.56.28  _1cao  62592 2737559 345841    3.7gb 10109746 true true 4.10.4 true
&part-largecatalog0      2 p 10.101.61.227 _1g8x  67713 2734559 345039    3.7gb 10098938 true true 4.10.4 true
&part-largecatalog0      5 r 10.101.60.45  _15za  54406 2733689 344958    3.7gb 10090434 true true 4.10.4 true
&part-largecatalog0      2 r 10.101.55.203 _1bgr  61515 2733323 344583    3.7gb 10097050 true true 4.10.4 true
&part-largecatalog0      1 p 10.101.53.160 _1e64  65020 2676213 333414    3.6gb  9854106 true true 4.10.4 true
&part-largecatalog0      0 r 10.101.59.128 _104z  46835 2689569 333226    3.6gb  9937554 true true 4.10.4 true
&part-largecatalog0      4 p 10.101.63.231 _1562  53354 2678859 332846    3.6gb  9896026 true true 4.10.4 true
&part-largecatalog0      4 r 10.101.59.42  _1af8  60164 2679411 332785    3.6gb  9900530 true true 4.10.4 true

Earlier I had all the default settings for ES 1.7, except

"index.store.throttle.type": "none" 

When this index was created, the max_merged_segment was default ( 5gb). When I saw deletes getting accumulated, I set it to 1gb and tried to tune settings in order to recover space. But this did not work...The percentage of deleted documents is spread evenly across segments and is about 12%. Considering this, I have a couple of questions -

  1. How much percentage of deleted docs is needed for a segment to be considered for merging?
  2. If I had 1gb max_merged_segment right before loading data, would it help increasing percentage of deleted docs? Would reindexing help?

Thanks,
Madhav.