How to identify and remove duplicates in Elasticsearch index

Sowmya1 · June 22, 2022, 10:40am

we are using elasticsearch 7.11.1 recently we observed an issue and below are the points for it.

we store data for every 15 mins interval and we get time stamp from our input file (ex: 05:00, 23:15, 20:30, 11:45 )
recently we observed our input file at 23:15 has 1890 records, but index has 3533 records.
now we want to delete 1643 duplicate records from index, with out disturbing 1890 records.

We need API query for that.

for example

input file

name product sale id
sai pen 100 1
kumar car 30 2
sai pen 100 1
sai pen 100 1
ram bike 288 3
kumar car 30 2

After deleting duplicates my index should look like below,

name product sale id
sai pen 100 1
ram bike 288 3
kumar car 30 2

I need help with

query to find only duplicates at 23:15
query to delete duplicates

Can you please share the API query for the above issue.

Sandeep_Raju · June 22, 2022, 11:37am

One way we can do this is be concatenating all 4 fields into one field and then if that field count > 1 , then use delete by query to delete that duplicate.
If you are using some time field like timestamp or updated , I believe we can use delete by query for this.

Sowmya1 · June 22, 2022, 1:25pm

Please help us with the query to delete the duplicate records at time stamp of 23:15

Christian_Dahlqvist · June 22, 2022, 1:35pm

Maybe this blog post might be useful? I am not sure there is a way to reliably create a query to use with delete by query to handle this, so the approach described in the blog post may be safer.

system · July 20, 2022, 1:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to identify and remove duplicates in Elasticsearch index Elasticsearch	3	332	July 20, 2022
How to identiry duplicates and delete it in index Elasticsearch	7	430	July 21, 2022
Remove duplicate / multiple data records in Elastic Elasticsearch	2	269	September 23, 2021
Identify and delete duplicates on several indexes Elasticsearch	1	1964	January 9, 2018
Deleting duplicates in index using API query Elasticsearch	2	314	June 23, 2022

How to identify and remove duplicates in Elasticsearch index

Related topics