Elasticsearch Head - Find and Remove Duplicate Documents

Hi All,

Background:
Inside my application, users noticed duplicate transactions -> investigating deeper, we noticed that we have duplicates in our elasticsearch database, likely because we did not restart our logstash/filebeat servers properly.

Problem - duplicates of the same txn in elasticsearch:
Example -> txn_ref "0011XX8711234".

Note that the values for the 'message body', fields are all the same with the exception of unique identifiers like id, createdDate and more.

So I read up on duplicates on other similar posts and came up with the following query to find:

POST -> sample_index*/_search

{"aggs":{"duplicate_docs":{"terms":{"field":"source.txn_ref","size":2,"min_doc_count":2}}}}

image

The search results didn't matched with the duplicates identified by testers which probably meant the query fields aren't being used properly?

If anyone could advise on how I can go forward, please advise.

Thank You

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.