Iâm using MLT. Trying to change percent_terms_to_match parameter, but
nothing changes.
Values 0.3, 0.5, 1
returns same results including only 1 term match between docs.
Is it correct?
It is impossible to say without knowing your data.
I suggest you play with min_term_freq, min_doc_freq - you're probably
testing this on a small number of docs.
I can tell you that we're using mlt with great success in production
My data just set of md5 strings divided by space. This md5 hashes of my
specific objects. There 30 to 500 different hashes (terms?) in one doc with
total of 160 to 2500 tems (one hash may be repeated from 1 to 10 times)
I need to find docs, where more than only 1 or 2 terms (md5 hashes) is
matches.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.