How to improve search results with fuzziness on?


(Pkulas) #1

Hello, I'm trying to understand and improve search results.
Analyzer use ngram min=3, max=3, language='Swedish', filter=['lowercase'].

For search fuzziness is set for 1.
It doesn't work bad but we search for query "Frisor" and it output results with "Massör".
Is anyone able to explain why this happen? I would like to exclude results like this but also understand why is show with fuzziness of 1, there is more different characters?

I would set some minimal score for search, but we use array with multiple words for analyzer and operator "and" so scores are pretty similar.


(Nik Everett) #2

Could you post the actual mapping in JSON? In general using a language analyzer with ngrams is going to make things weird. And using fuzziness with ngrams is a bit odd too. I'm sure it does something, but what it does is fairly complicated.


(Pkulas) #3

Mapping:

{
  "clinic" : {
    "mappings" : {
      "practitioner" : {
        "properties" : {
          "absolute_url" : {
            "type" : "string",
            "index" : "no"
          },
          "booking_last_month" : {
            "type" : "boolean"
          },
          "clinic_city" : {
            "type" : "string",
            "index" : "no"
          },
          "clinic_id" : {
            "type" : "integer"
          },
          "clinic_logo" : {
            "type" : "boolean"
          },
          "clinic_name" : {
            "type" : "string",
            "index" : "no"
          },
          "clinic_photo" : {
            "type" : "boolean"
          },
          "default_service_price" : {
            "type" : "integer"
          },
          "enabled" : {
            "type" : "boolean"
          },
          "filter_availability" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "integration" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "location" : {
            "type" : "geo_point"
          },
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "pk" : {
            "type" : "long"
          },
          "practitioner_id" : {
            "type" : "integer"
          },
          "practitioner_photo" : {
            "type" : "boolean"
          },
          "q" : {
            "type" : "string",
            "analyzer" : "swedish_ngram_analyzer"
          },
          "q2" : {
            "type" : "string",
            "analyzer" : "swedish_ngram_analyzer"
          },
          "score_boost" : {
            "type" : "float"
          }
        }
      }
    }
  }
}

Which would be best? language analyzer + just fuzziness?


(Nik Everett) #4

The best performance is going to be just the language analyzer. Fuzziness may or may not provide better hits. It'll certainly provide more hits but they might not make any sense. The usual thing to do is to run the search and use something like the phrase suggester to suggest better search terms if any are available but the phrase suggester is a bit difficult to tune effectively. Have a look at how it is tuned here for a starting place.


(system) #5