After adding references to this analyzer in my mappings, my multi match queries began returning weird results. Specifically, I index documents that look like
{
"foo": {
"type": "bob",
"value": "sam"
}
}
My multimatch is set to query explicitly on "foo.value", when I search on sa/sam it works, however since applying these settings searches for bo/bob return matches as well. If I'm only searching on foo.value I wouldn't expect this.
I just realized after posting this that the scores for this case were all extremely low. It seems like ES is finding partial matches on the search, but nothing concrete. To work around I"m going to set the min score to .5 to see how it goes.
So you are searching for FooBar and it give you back bob? Or these are completely unrelated examples? Sorry, I still cannot figure out what you are trying to do and what does and doesn't work. I would be glad to help if I could easily recreate the issue on my machine. Please see https://www.elastic.co/help for some suggestions about how to make your questions easier to understand. Thanks!
Yes, this is the verbatim search content/terms. It's purposefully stupid sounding, but I did verify the issue exists with this content.
As best as I can tell, we're getting a very low score because a substring of "FooBar" matches against "bob", particular the "ob" parts. I'm not sure if this is the intended behavior for ngrams (to also break up the search term) but this is the only thing I can surmise. I would have expected a 0 score, but if it is breaking up the search term into ngrams as well this makes some sense to me.
Ok, now it makes more sense. Indeed, by default the same analyzer is used for both indexing and searching. So, during search the search term will be tokenized into n-grams and because by default mutli_match applies "OR" operator to all tokens it will match any field that has at least one matching n-gram present. In order to solve this problem you need to replace the search analyzer with an analyzer without the ngram filter.
Sorry, just realized that the replace link that I posted above was pointing to a wrong page. The search analyzer can be set by using search_analyzer parameters in the field mapping. So in your case it would look like this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.