Duplicate results in resultset

David_3 · June 19, 2012, 3:08pm

I have the following issue, I have an index with denormalized data.. the
search returns the duplicated data and it seems there is no way in an
elasticsearch query
to distinct the search results to remove the duplicates. Is there a way to
remove the duplicates using elasticsearch API?

Thanks,
David

kimchy · June 21, 2012, 8:09am

What constitues a duplicate? In general though, no, duplicates between docs
in a single search request can't be filtered out, at least not easily.

On Tue, Jun 19, 2012 at 5:08 PM, David davidrockett@gmail.com wrote:

I have the following issue, I have an index with denormalized data.. the
search returns the duplicated data and it seems there is no way in an
elasticsearch query
to distinct the search results to remove the duplicates. Is there a way to
remove the duplicates using elasticsearch API?

Thanks,
David

Daniel_Schnell · June 21, 2012, 8:55am

What you could do depending on your data:

1.) add a field into the index with a unique hash value (e.g. MD5/SHA1) corresponding to the indexed fields and filter out double entries at query time after you got the results from ES.
2.) don't add a hash field to index but calculate the hash for each returned doc at query time. This could hurt performance badly, depending how many hits you like to show and how big the docs are.
3.) similar to 1.) add the hash field but post process your data after importing it to ES and remove all double entries via iterating over all id fields and using a more like this query for the hash field. For this you probably need to set the ids at insertion time and have an external reference of these somewhere else, e.g. the SQL DB you imported the data from
4.) similar to 1.) add the hash field and check duplicates with the more like this query before inserting new docs. This gives a very slow but probably safe import in context to duplicates

bye,
Daniel.

Am 19.06.2012 um 17:08 schrieb David:

I have the following issue, I have an index with denormalized data.. the search returns the duplicated data and it seems there is no way in an elasticsearch query
to distinct the search results to remove the duplicates. Is there a way to remove the duplicates using elasticsearch API?

Thanks,
David

akbari · April 29, 2014, 8:46am

Daniel Schnell, Can you write an example please, I have the same problem.

Topic		Replies	Views
7.x How do remove duplicates in search after query? Elasticsearch	2	500	February 2, 2021
How to remove duplicate search result in elasticsearch? Elasticsearch	5	1272	November 17, 2020
Any idea to remove the duplicates from the search results? Elasticsearch	2	2521	July 6, 2017
Distinct results by field for a given query Elasticsearch	5	950	July 6, 2017
Remove results with same id from Elasticsearch search result Elasticsearch	2	400	December 11, 2020

Duplicate results in resultset

Related topics