Hi All,
Background:
Inside my application, users noticed duplicate transactions -> investigating deeper, we noticed that we have duplicates in our elasticsearch database, likely because we did not restart our logstash/filebeat servers properly.
Problem - duplicates of the same txn in elasticsearch:
Example -> txn_ref "0011XX8711234".
Note that the values for the 'message body', fields are all the same with the exception of unique identifiers like id, createdDate and more.
So I read up on duplicates on other similar posts and came up with the following query to find:
POST -> sample_index*/_search
{"aggs":{"duplicate_docs":{"terms":{"field":"source.txn_ref","size":2,"min_doc_count":2}}}}
The search results didn't matched with the duplicates identified by testers which probably meant the query fields aren't being used properly?
If anyone could advise on how I can go forward, please advise.
Thank You