Need Help / Suggestion with Deleted Documents


(Atif) #1

Currently one of the index that we have shows the following stats

The stats show that the total documents count is 2081 whereas a considerable amount of documents are marked as deleted. This has happened I think due to luncene engine's default settings.

We can reclaim the space using expunge deletes option but is it possible to somehow unmark / restore the deleted documents in another cluster or within the same cluster.


(Adrien Grand) #2

Can you let us know what version you are running and whether you have errors in the logs (eg. failed merged).


(Atif) #3

Hi Jpountz,

The index was previously created on ElasticSearch Version 1.4, recently the ElasticSearch was upgraded to 1.7 after I took over the work.
I have checked with the status of the index and it states no-merges have taken place as shown in the diagram below.

I have tried to check the logs but for this index there are no logs detailing any error.

However, while going through different statuses - I think i need to upgrade the index. I am sorry a bit new to this advanced configurations of elasticsearch as previously we were using basic configurations with small tweaking. But now are thinking of moving to production so looking into these issues.

Underneath are the detail screenshots of the index.






If you see in the screenshot in couple of another the same issue has happened - One more addition to it in all these cases the Index has one 1 shard.


(Adrien Grand) #4

@mikemccand Does it ring a bell to you?


(Christian Dahlqvist) #5

It is interesting to see that the number of deleted documents are almost consistently around 10000 times the number of existing documents per the count. Is this from a test or production environment? Are you using TTL? Are you perhaps updating or indexing the same documents repeatedly with the same IDs, resulting in deleted documents as a side effect of updates?


(Michael McCandless) #6

Is it possible you did a bunch of deletes and didn't add any new documents
to this index?

Lucene previously would fail to trigger a merge in that case:
https://issues.apache.org/jira/browse/LUCENE-6166

This is fixed in Lucene 5.3 / ES 2.1.0.

If you add one document to the indices does that trigger a merge?

Mike McCandless


(Atif) #7

@Christian_Dahlqvist yes you are correct the deleted documents are as per you have mentioned. This is from the test environment as I am pushing one copy to test environment but I am now faced with this issue because we have to run some analysis on all the information collected and with documents marked as deleted it seems that most of it is lost. I have checked the original program from which this index was generated and no the documents cannot have same ids.

@mikemccand no the documents were not deleted and now new documents were added. My initial view was also the same it has something but did not come across this patch thanks for it. Bytw is it okay if I trigger a merge manually directly on lucene, will I be able to claim part of the documents?

Thanks @Christian_Dahlqvist, @mikemccand
for your comments.


(system) #8