Hello, I'm trying to understand and improve search results.
Analyzer use ngram min=3, max=3, language='Swedish', filter=['lowercase'].
For search fuzziness is set for 1.
It doesn't work bad but we search for query "Frisor" and it output results with "Massör".
Is anyone able to explain why this happen? I would like to exclude results like this but also understand why is show with fuzziness of 1, there is more different characters?
I would set some minimal score for search, but we use array with multiple words for analyzer and operator "and" so scores are pretty similar.
Could you post the actual mapping in JSON? In general using a language analyzer with ngrams is going to make things weird. And using fuzziness with ngrams is a bit odd too. I'm sure it does something, but what it does is fairly complicated.
The best performance is going to be just the language analyzer. Fuzziness may or may not provide better hits. It'll certainly provide more hits but they might not make any sense. The usual thing to do is to run the search and use something like the phrase suggester to suggest better search terms if any are available but the phrase suggester is a bit difficult to tune effectively. Have a look at how it is tuned here for a starting place.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.