Any idea to remove the duplicates from the search results?

Hi all,

I have some documents that look like:
{
"name": "",
"image_url: "",
"simHash": ""
}

And there are many duplicate documents with the same simHash value, any
idea to remove the duplicates instead of removing them in indexing time?

I have seen solutions
here: http://stackoverflow.com/questions/24080846/removing-duplicates-from-search-results

However, I would like to know if it is possible that I can write code in
Elasticsearch to do this job.

many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fb60e5b-8a88-4af3-b0c1-0c064b280c7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

You are probably looking for the same thing that I was looking for a few days ago: https://groups.google.com/forum/#!searchin/elasticsearch/tugberk/elasticsearch/1uCQ7R8vCS8/-iRJLrdGGrYJ

Top hits aggregation http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-aggregations-metrics-top-hits-aggregation.html#search-aggregations-metrics-top-hits-aggregation may be what you are looking for. I had a working query here: https://groups.google.com/forum/#!msg/elasticsearch/1uCQ7R8vCS8/MKpbJjVM0SMJ but note that I wasn’t able to paginate over aggregation result. It seems not possible at first glance: https://github.com/elasticsearch/elasticsearch/issues/4915

Tugberk

From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On Behalf Of Peiyong Lin
Sent: Friday, September 26, 2014 10:16 AM
To: elasticsearch@googlegroups.com
Subject: Any idea to remove the duplicates from the search results?

Hi all,

I have some documents that look like:

{

"name": "",

"image_url: "",

"simHash": ""

}

And there are many duplicate documents with the same simHash value, any idea to remove the duplicates instead of removing them in indexing time?

I have seen solutions here: http://stackoverflow.com/questions/24080846/removing-duplicates-from-search-results

However, I would like to know if it is possible that I can write code in Elasticsearch to do this job.

many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fb60e5b-8a88-4af3-b0c1-0c064b280c7e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5fb60e5b-8a88-4af3-b0c1-0c064b280c7e%40googlegroups.com?utm_medium=email&utm_source=footer .
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/055901cfd968%24eb502cc0%24c1f08640%24%40gmail.com.
For more options, visit https://groups.google.com/d/optout.