Going to hit lucene 2B Document limit, want to confirm which value


(James Harbal) #1

I have an index where the shards are getting near the 2.14B Document Limit for "max_doc" value.
I just want to make sure that this is the limit that will make it stop indexing new documents.
Otherwise, does it go by the current "num_doc" value, which is at 1.25B (.9B deleted/updated) so far, which would allow for me to relax a little.

Thanks


(Mark Walkom) #2

Deleted docs shouldn't count, but maybe a proper lucene expert can confirm.


(James Harbal) #3

Ok, I couldnt find the answer before, but I think I found it here: https://www.elastic.co/blog/lucenes-handling-of-deleted-documentsWhich says that deleted docs do count towards the 2.1B limit. Ughh

Sent from Yahoo Mail on Android


(Mark Walkom) #4

Damn :confounded:


(Colin Goodheart-Smithe) #5

Yep, the limit will be for max_doc since Lucene internally uses an integer id to refer to each document. Because deleted documents are soft_deletes, it still needs to have an id assigned to them to be able to know which are the deleted documents.

A solution to your issue could be to start indexing into a new index and use an alias to be able to refer to both the new and existing index as if they were 1 index. This would avoid you needing to re-index all the existing data. Given that at that point the existing index would be rarely accessed you could then optimize the existing index to remove the deleted documents entirely. The downside here is that if you still need to be able to delete documents from the existing index you will need whatever is indexing the content to be aware of both indexes and to delete the document from the correct index.


(system) #6