Lots of deleted documents above 40%

Slava_G · August 12, 2017, 2:10pm

Hi,
In my index I recently removed many documents, but after 3 weeks I still see that index has above 40% of deleted documents, I know that merger somehow, eventually should clean this, but it's not happened yet and index size is not reduced. I also don't want to make force merge with purge deleted option, as last time I did this my index became very unstable (yellow).
What can be done here to make elastic to clean all deleted docs (reindex is also not an option) ?
Thanks

thiago · August 12, 2017, 3:47pm

Can you please post the output of GET /_cat/segments/<index_name>?v

Slava_G · August 12, 2017, 4:52pm

Well, it's really big output to put it here.

thiago · August 12, 2017, 4:53pm

Can you please use pastebin or maybe gist?

Slava_G · August 12, 2017, 5:20pm

Please:

gist.github.com

https://gist.github.com/slavag/5b6d65ef37be0eae5a1e2edb6c3f8f8b

segments

{
  "_shards": {
    "total": 40,
    "successful": 40,
    "failed": 0
  },
  "indices": {
    "index1": {
      "shards": {
        "0": [

This file has been truncated. show original

thiago · August 12, 2017, 5:31pm

You have at least 3 segments there that are really big (~5GB) and most of deleted documents are there. Due to the way the merging algorithm works, it is going to take quite sometime until they get merged.

There are a few causes for this to happen. Do you constantly update and/or delete documents very fast? Did you call forcemerge in the past and kept indexing in it?

For now, the fastest way to fix this is to reindex.

Slava_G · August 12, 2017, 5:35pm

Usually I don't do massive delete but did it few weeks ago.
I didn't call forcemerge on that index.
And reindex it's a really not an option, as index don't holds source field and need to take a origin data and to reindex and it's about 0.5 Peta.

thiago · August 12, 2017, 6:05pm

If you want to keep that index and purge the deleted documents, then the only way is by forcemerging to 1 segment and stop writing to this index (i.e. no deletes/indexing/updates), otherwise it will just make it worse.

Slava_G · August 12, 2017, 6:20pm

So, just waiting for merging will not solve them problem ? As I can't stop writing to it. It used by customers.

thiago · August 12, 2017, 6:22pm

It's going to take a while and it's very difficult to estimate.

Check the videos in this blog post to get an idea: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Slava_G · August 12, 2017, 9:58pm

Thanks, I think to wait is the only option for me. Btw, if I'll decide for purge deleted force merging, can I tell to index to not perform any write operations ?

thiago · August 12, 2017, 10:14pm

You can mark an index as read-only. Actually, you can keep writing to that index, the only real issue is if you end up into the same problem by doing a massive delete again.

You should avoid doing massive deletes anyway, if your data has a timestamp it is strongly recommended that you use time-based indices so when you need to delete documents based on a time range, it will be much better.

For setting index read-only set index.blocks.read_only: true

Slava_G · August 12, 2017, 10:32pm

No my index is not time-based.
So, what is the best practice if I do need to perform a massive delete again ?

Setting index to read-only will not prevent from merging ?
Btw, I did once force merge in 1 segment, on big index, and it's caused to index to be yellow and many shards went into recovery. Doing only purge deleted can also cause this ?

Thanks

thiago · August 15, 2017, 11:31pm

How did you perform this massive delete? Was it using the bulk API? Or maybe using delete by query?

It is not expected to prevent. Indeed the forcemerge API is generally recommended as a house keeping operation for smaller indices that are considered read-only. You could try to setting to only expunge deletes, it may help, but I don't expect much since you have massive segments with deleted documents.

Slava_G · August 15, 2017, 11:46pm

I used bulk api to delete.
Btw, if I'll delete entire type, will this release the space or the behavior will be exactly the same.
I really afraid of force merge operation as my index is huge.
Thanks

rjernst · August 16, 2017, 12:25am

It looks like you are on a very old version of Elasticsearch (1.7 maybe?), because I see lucene version 4.10 in your segments. While I'm not sure if this is the exact issue here, I do remember an issue long ago with deletions alone not triggering merges: https://issues.apache.org/jira/browse/LUCENE-6166.

You should upgrade to a more recent version, but in your case, that will mean reindexing.

thiago · August 16, 2017, 12:28am

There is no difference as there is no physical separation of the types. All types go into the same physical shard.

While doing bulk delete, do you change the index refresh interval? What is the index refresh interval that you use?

Slava_G · August 16, 2017, 2:27am

Checked, and my default refresh interval is 20 sec, as i don't need real-time index.
Do you think I can make it even bigger ?
Thanks

Slava_G · August 16, 2017, 2:33am

Yes, my ES is really old, 1.7.5.
The bug you're referenced as I understand is talking about index where only deletions and no new inserts, correct ? So, my case is that we index and heavily.
Upgrading is to reindex and this is something that we can't afford now.

Thanks

thiago · August 16, 2017, 2:48am

This is not a bug. It is simply how Elasticsearch merging works. This happened because you have refresh time set to 20 secs and probably you have used a large bulk size for deletes as well. To avoid problems in the future with massive delete set refresh to 1 sec and use small bulks (around 100-300).

For now you could try forcemerge to 1 segment with expunge delete only. But this might have unwanted consequences, the best thing to do would be reindexing.

Topic		Replies	Views
Ways to purge deleted documents in 4TB indice Elasticsearch	6	744	January 12, 2019
High number of deleted docs in segment Elasticsearch	2	725	February 25, 2018
Elasticsearch/Lucene Delete space reuse? recovery? Elasticsearch	5	598	July 6, 2017
Unable to free disk space after deleting some documents Elasticsearch	6	2646	July 5, 2017
Remove docs.deleted Elasticsearch	7	9317	October 31, 2017

Lots of deleted documents above 40%

Related topics