How to identify and remove duplicates in Elasticsearch index

we are using elasticsearch 7.11.1 recently we observed an issue and below are the points for it.

  1. we store data for every 15 mins interval and we get time stamp from our input file (ex: 05:00, 23:15, 20:30, 11:45 )
  2. recently we observed our input file at 23:15 has 1890 records, but index has 3533 records.
  3. now we want to delete 1643 duplictae records from index,with out disturbing 1890 records.

We need API query for that.

for example

input file

name product sale id
sai pen 100 1
kumar car 30 2
sai pen 100 1
sai pen 100 1
ram bike 288 3
kumar car 30 2

After deleting duplicates my index should loook like below,

name product sale id
sai pen 100 1
ram bike 288 3
kumar car 30 2

I need help with

  1. query to find only duplicates at 23:15
  2. query to delete duplicates

Should not be a issue I believe. What I would suggest it create a Visualization say data, take metric as Uniq and then select Terms as to what fields it should be find a Unique value. Then Edit that visualization and Inspect the Request. Adjust your Timeline accordingly. It gives you a query which is generally GET.

Then you can do a bit modification to pass DELETE API where search?q= then pass that as a Json object.

Can you please share the query for the same

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.