How to get not too much similar documents?

Illidan · October 22, 2017, 4:25am

I have a requirement to retrieve similar documents, but not too much similar (because we have in database many almost identical documents, which we want to skip in search).

I looking into "More Like This" documentation, and I can't find something to limit similarity rating. How I can achieve that (e.g. finding similar documents, but not too much similar? I would like to have max similarity of 80%)?

spinscale · October 23, 2017, 8:37am

Not sure about your requirement, but maybe the max_doc_freq parameter might help you a little bit, though it was supposed to filter out stopwords mainly.

Illidan · November 17, 2017, 10:32am

Actually "MaxQueryTerms" seems to give me what I need. Thanks anyway!

system · December 15, 2017, 10:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
MoreLikeThis query performance with some extremely common words Elasticsearch	1	750	July 5, 2017
Way to Elasticsearch returns similar docs count by each doc returned on search Elasticsearch	1	386	July 5, 2017
How to find Similar documents Elasticsearch	4	2528	July 5, 2017
How to normalize similarity score using 'more_like_this' Elasticsearch	1	503	October 22, 2020
Finding documents _almost_ the same Elasticsearch	5	2757	December 13, 2016

How to get not too much similar documents?

Related topics