HI,
I am looking for a way which can remove the duplicated search result in ES,
I am eager to anybody's help.
first, i want to explain the requirement. I have created indexs for three
documents, each index have the unique primary key and the same docid. Such
documents may be published by the same author at different time . if i
search the related documents from ES, i will get three documents, but i
only want the newest one. I need to remove other duplicated documents.
I want to develop custom plugin to implement the requirement ,but finally i
failed , because there is no chance to install my plugin after ES have
collected all search result . Does anyone encountered the same problem?
Some people have met the same problem from the following link.
There is a duplicate filter called DuplicateFilter in lucene, which can
remove duplicate values from search result. Maybe, I can use this filter to
remove the articles having the same author .
Please see the following link.
http://lucene.apache.org/core/4_0_0/sandbox/org/apache/lucene/sandbox/queries/DuplicateFilter.html
but the lucene filter can not used in ES directly .Some people have met the
same problem , and kimchy have given the solution . please take a look at
the following link.
http://elasticsearch-users.115913.n3.nabble.com/Possible-to-use-Lucene-filters-td3375477.htmlhttps://webmail.thomsonreuters.com/owa/redir.aspx?C=2A_IIIx6-Ui66Zlw4-WcAPvwYMgU6dAIASNU_x9YA-RCxTb12DtqUV7eTD6S8Jd7PkACpkB9bfg.&URL=http%3A%2F%2Felasticsearch-users.115913.n3.nabble.com%2FPossible-to-use-Lucene-filters-td3375477.html
some people also want to use DuplicateFilter in ES, and have asked kimchy
for help. The following link show the detail .
https://github.com/elasticsearch/elasticsearch/issues/1405https://webmail.thomsonreuters.com/owa/redir.aspx?C=2A_IIIx6-Ui66Zlw4-WcAPvwYMgU6dAIASNU_x9YA-RCxTb12DtqUV7eTD6S8Jd7PkACpkB9bfg.&URL=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F1405
so, we may have the solution to solve our problem , but it is not the best
one according to kimchy's opinion .
in a word , any of above way is not the perfect solution, does anybody met
the same problem ?
Thanks,
ming
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f21b7fa0-1cd1-4aae-87fa-93fe463f39cc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.