I am trying to use more like this query for finding similar documents and faced with two issues
- I set min_should_match to 70% and got results like matched documents only by one word. And strange thing that these words are like "and" or "the". This is an example which I got with explain:true
"weight(title:and in 0) [PerFieldSimilarity], result of:". Matched document has words like "celebrates" and "birthday" which I assume have more IDF than the word "and". So how can I control clauses which MLT choose to search? (excepts of doc_freq, term_freq and word_length). I found that if I set min_should_match to integer, then I get more suitable results.
- Matched documents have no interconnection. For example the document "aaa bbb ccc" can match the document "aaa bbb ccc ddd eee". But "aaa bbb ccc ddd eee" cannot match "aaa bbb ccc" document.
I tried to change default similarity to bm25 or classic, but it has no effect.