Elasticsearch documents getting deleted

We’ve been experiencing an issue where certain documents in our Media index go missing. What’s unusual is that it’s often the same documents that disappear each time.

Here’s what we’ve observed:

  • We have a process that deletes older media when new media is indexed.

  • Initially, we suspected that deletes might be happening after new documents were indexed, but the behavior is inconsistent.

    • In some cases, documents disappear within seconds of indexing.

    • In other cases, the same documents stay for weeks before disappearing.

  • Because of this, we’re not sure if the issue is tied to our delete logic or something else.

We also tried enabling audit logs to investigate further, but we’re struggling with filtering. Specifically:

  • We delete documents based on an ExpandKey field.

  • We’d like the cluster to log only delete events where ExpandKey starts with a certain prefix, instead of logging all delete operations (which creates a lot of noise).

We’d really appreciate guidance on:

  1. Best practices for tracking down unexpected deletions in Elasticsearch/OpenSearch.

  2. How to configure audit logging (or another mechanism) to capture only the deletes that match certain field criteria.

P.S I have realized that we are using routing when indexing documents, but not using routing when deleting them. I will try to rectify this, but can this be a cause of this?

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

Which product are you using? It is quite possible that the answer depends on this.

It sounds like you might be using delete by query to remocve old data. It would be useful to have more detail about exactly how your deletion process works.

This will depend a lot on the product and version used, which you have not specified.

That depends on how you delete data.

  1. I am using Managed OpenSearch service from AWS.

  2. Yes I am using delete by query.

So for our media index, whenever we have an update for our Media, we first delete the previously existing Media documents using an ExpandKey.

Here’s an example,

/media_1_7/_delete_by_query {"query":{"query_string":{"query":"Metadata.ExpandKey.keyword: 13711637256"}}}

The ExpandKey is a combination of {datasourceID}{listingID}. so {1371}{1637256}

For different data sources, we can have the same ListingID. But this combination should be unique in the whole system.

So before indexing the new media, we have code to delete older media like so:

        indexRecords = indexRecords.Where(doc => doc.DoIndex).ToList();
        // run a delete before indexing the documents
        var deleteResponses = await indexRecords
            .Where(x => x.Indexing.DeleteQuery != null)
            .Select(_esRepo.DeleteDocumentAsync)
            .WhenAll()
            .ConfigureAwait(false);
        foreach (var error in deleteResponses.Where(x => x.OriginalException != null))
        {
// log errors
        }

        // now index all the records
        var bulkResponse = await _esRepo.IndexDocumentsAsync(indexRecords);
  1. I am using OpenSearch version OpenSearch_2_11_R20250630

I hope it answers your questions!

OpenSearch is a different product from Elasticsearch so I would recommend you reach out to the OpenSearch community or AWS support. Their implementation of security and audit logging is completely different from Elasticsearch and I do not know whether there are any special limitations or peculiarities related to their managed service.

Are the fields fixed length or is it possible {1371}{1637256} could exist at the same time as e.g. {137}{11637256}?

I think you might’ve caught the problem! I am so shocked I didn’t consider it xD

Thanks alot! I think the problem has mostly been resolved