Remove deleted documents from large segments

Only when segments merge, deleted documents are actually cleaned up.
But when segment size goes beyond a limit (default: 5 GB), segments are no longer merged. In this case, how do you suggest forced removal of deleted documents in this segment (say 40% docs are deleted) ?

There is an option provided where we can set our own default for "index.merge.policy.max_merged_segment", but that alone doesn't seem to completely help in this situation.

How about exposing a REST action (something along the terms of optimize?only_expunge_deletes) which doesn't depend on segment merge, but just goes ahead and removes all deleted documents. (The new segments are going to be a 1:1 mapping of old segments, but with cleaned up deleted docs).

I don't think this is a great idea but I don't have a great idea. Folks tend to make less than stellar decisions when they have direct control of merging. It is surprisingly fraught.

The merge policy will pick those big segments to merge eventually - it really likes to merge similarly sized segments and it doesn't consider merges that don't look like they'll stay under the limit. I believe it'll start to think about merging when two segments would get under the limit. It even places a premium on expunging deleted docs! But it seems like you end up with high numbers of deleted docs in practice when you are interactively updating a lot.

There are three ways to attack this and I've tried both over the years:

  1. Do fewer updates/deletes. Batch and deduplicate update operations. OTOH updates to a small segment aren't nearly as big a deal as updates on a big segment.
  2. Try and get a patch into Lucene's merge policy. I've tried and never got too far. I believe the class is TieredMergePolicy.
  3. Throw more hardware at the problem. Ultimately those deleted docs "only" cost you disk space, io, and RAM. They don't actually cost much in the way of CPU. Anyway, those are all things money can buy and that's turned out to be cheaper for me in the past than the other two operations.

Part of the trouble with number 2 is that merge policy changes can have lots of unintended consequences. The code is eminently readable, even easy to understand and change. But the consequences reach a long, long way.

I don't think this is a great idea.

Based on your experience, I would like to know more about the issues and challenges with this solution? What are the consequences that need to be thought of for this?

To make myself clear about this approach, this would be more of a Lucene level implementation where I would just read one segment (using PostingsReader) and directly keep writing (PostingsWriter) it to a new segment skipping the deleted documents. This should be independent of the merging that happens. Also, till we have the perfect policy in place, this could be a manual operation and not an automatically triggered operation.

This is what expunge_deletes actually does, except it will merge multiple "has deletes to remove" segments at once. And all parts of the index need to be rewritten, not only postings.

Note that the max sized segments (> 5 GB by default) will be eligible for merging when they cross 50% deletions.

Why is the 40% deletions causing problems ...? For heavy update use cases, this number should range between ~40% to ~60% once in steady state, and I'm not sure we could do much better without heavier indexing cost.

This blog post goes into some details: Lucene's Handling of Deleted Documents | Elastic Blog

Mike McCandless

1 Like