Indexed documents disappearing randomly within a short time after indexing

Hello, I'm having a weird issue with elasticsearch when indexed documents would disappear from the index within a short time after indexing.
My application is indexing a couple of thousands documents (not a huge batch) and I can see in kibana the amount of newly indexed documents, but after a short time, this number drops and as if those documents were never indexed.
There is enough disk space, there is no ID collision, other documents are being indexed to the same index and do not disappear... I've never seen this strange behaviour.

To briefly explain what I do:
I have an index of 'articles', then an app is translating those articles. When translated, the original article will receive a new id, a new property (Language = "de", or "es", etc..), some additional 'tags' and be stored in the same index where the original article came from. If the app has translated 5K articles, then I expect to see them in the index... I see them briefly, then they start disappearing in batches, until the number eventually stops at 160 or 230, etc.. not the 5K, which was already ingested. What is going on?

I'll be grateful for any hints

Thanks in advance!

How are your IDs constructed? If you keep track of the ID of the documents you have inserted that disappear and try to GET them by ID, what do you get?

The original ID looks like this "C5EA78A22B04F30CD0E14F54D85EC0AF",

The new ID is an MD5 hash of the original ID + "_{Language}", - becomes completely different, yet I can check if it's been processed or not.. When I try to get the translated article by ID (the new ID), I can get it, only until it disappears, then it's as if it was never indexed... doesn't exist.. strange

I am running a test now, where the translated articles will not be stored in the same index the original articles came from, but in a new index - "translated-articles" (for example), and looks like the articles don't disappear from the new index... I would love to understand what's going on..

Elasticsearch does not delete documents automatically and there is no TTL, so I suspect you have some process either deleting or overwriting the documents in question.

2 Likes

aha.. ok I will try to confirm and find out which other process could possibly interact with it.. I do have multiple 'micro apps' each doing it's part (pre-processing/topical scoring, that translating app, etc..) It could be the reason indeed

Thank you very much for your help! Your reply has helped me switch the focus and identify the problem. It was indeed a process that deleted the newly translated articles. A de-duplication system was going after one property that was being cloned in the translated articles :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.