How to identify and remove duplicates in Elasticsearch index

Komal_42 · June 22, 2022, 8:05am

we are using elasticsearch 7.11.1 recently we observed an issue and below are the points for it.

we store data for every 15 mins interval and we get time stamp from our input file (ex: 05:00, 23:15, 20:30, 11:45 )
recently we observed our input file at 23:15 has 1890 records, but index has 3533 records.
now we want to delete 1643 duplictae records from index,with out disturbing 1890 records.

We need API query for that.

for example

input file

name product sale id
sai pen 100 1
kumar car 30 2
sai pen 100 1
sai pen 100 1
ram bike 288 3
kumar car 30 2

After deleting duplicates my index should loook like below,

name product sale id
sai pen 100 1
ram bike 288 3
kumar car 30 2

I need help with

query to find only duplicates at 23:15
query to delete duplicates

Blason · June 22, 2022, 8:25am

Should not be a issue I believe. What I would suggest it create a Visualization say data, take metric as Uniq and then select Terms as to what fields it should be find a Unique value. Then Edit that visualization and Inspect the Request. Adjust your Timeline accordingly. It gives you a query which is generally GET.

Then you can do a bit modification to pass DELETE API where search?q= then pass that as a Json object.

Komal_42 · June 22, 2022, 9:38am

Can you please share the query for the same

system · July 20, 2022, 9:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to identiry duplicates and delete it in index Elasticsearch	7	391	July 21, 2022
How to identify and remove duplicates in Elasticsearch index Elasticsearch	4	3543	July 20, 2022
Deleting duplicates in index using API query Elasticsearch	2	285	June 23, 2022
Duplicate documents in Elasticsearch Elasticsearch	1	984	June 23, 2017
Identify and delete duplicates on several indexes Elasticsearch	1	1941	January 9, 2018

How to identify and remove duplicates in Elasticsearch index

Related topics