How to get not too much similar documents?

I have a requirement to retrieve similar documents, but not too much similar (because we have in database many almost identical documents, which we want to skip in search).

I looking into "More Like This" documentation, and I can't find something to limit similarity rating. How I can achieve that (e.g. finding similar documents, but not too much similar? I would like to have max similarity of 80%)?

Not sure about your requirement, but maybe the max_doc_freq parameter might help you a little bit, though it was supposed to filter out stopwords mainly.

Actually "MaxQueryTerms" seems to give me what I need. Thanks anyway!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.