How to identiry duplicates and delete it in index

we are using elasticsearch 7.11.1 recently we observed an issue and below are the points for it.

  1. we store data for every 15 mins interval and we get time stamp from our input file (ex: 05:00, 23:15, 20:30, 11:45 )

  2. recently we observed our input file at 23:15 has 1890 records, but index has 3533 records.

  3. now we want to delete 1643 duplictae records from index,with out disturbing 1890 records.

We need API query for that.

for example

input file

name product sale id
sai pen 100 1
kumar car 30 2
sai pen 100 1
sai pen 100 1
ram bike 288 3
kumar car 30 2

After deleting duplicates my index should loook like below,

name product sale id
sai pen 100 1
ram bike 288 3
kumar car 30 2

I need help with

  1. query to find only duplicates at 23:15
  2. query to delete duplicates

This post is not new but it can give you a direction on how to perform the operation.

It seems to be the same problem.

We are looking for an API query approch

we are looking for an API solution , and we didnt get that, so we are searching for the same,

Much appreciated if get help with API query

As I pointed out in my response I do not believe there is a query that can be used with delete by query to do this (which seems to be what you are looking for). I would therefore recommend looking at the approaches described in the blog post I linked to.

Most people frequenting the forums respond only if they have a solution, so if you do not receive any solution within a reasonable time period it is quite possible that what you are looking for is not possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.