How to improve search results with fuzziness on?

pkulas · September 29, 2016, 11:02am

Hello, I'm trying to understand and improve search results.
Analyzer use ngram min=3, max=3, language='Swedish', filter=['lowercase'].

For search fuzziness is set for 1.
It doesn't work bad but we search for query "Frisor" and it output results with "Massör".
Is anyone able to explain why this happen? I would like to exclude results like this but also understand why is show with fuzziness of 1, there is more different characters?

I would set some minimal score for search, but we use array with multiple words for analyzer and operator "and" so scores are pretty similar.

nik9000 · September 29, 2016, 6:09pm

Could you post the actual mapping in JSON? In general using a language analyzer with ngrams is going to make things weird. And using fuzziness with ngrams is a bit odd too. I'm sure it does something, but what it does is fairly complicated.

pkulas · September 30, 2016, 3:41pm

Mapping:

{
  "clinic" : {
    "mappings" : {
      "practitioner" : {
        "properties" : {
          "absolute_url" : {
            "type" : "string",
            "index" : "no"
          },
          "booking_last_month" : {
            "type" : "boolean"
          },
          "clinic_city" : {
            "type" : "string",
            "index" : "no"
          },
          "clinic_id" : {
            "type" : "integer"
          },
          "clinic_logo" : {
            "type" : "boolean"
          },
          "clinic_name" : {
            "type" : "string",
            "index" : "no"
          },
          "clinic_photo" : {
            "type" : "boolean"
          },
          "default_service_price" : {
            "type" : "integer"
          },
          "enabled" : {
            "type" : "boolean"
          },
          "filter_availability" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "integration" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "location" : {
            "type" : "geo_point"
          },
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "pk" : {
            "type" : "long"
          },
          "practitioner_id" : {
            "type" : "integer"
          },
          "practitioner_photo" : {
            "type" : "boolean"
          },
          "q" : {
            "type" : "string",
            "analyzer" : "swedish_ngram_analyzer"
          },
          "q2" : {
            "type" : "string",
            "analyzer" : "swedish_ngram_analyzer"
          },
          "score_boost" : {
            "type" : "float"
          }
        }
      }
    }
  }
}

Which would be best? language analyzer + just fuzziness?

nik9000 · October 1, 2016, 11:52am

The best performance is going to be just the language analyzer. Fuzziness may or may not provide better hits. It'll certainly provide more hits but they might not make any sense. The usual thing to do is to run the search and use something like the phrase suggester to suggest better search terms if any are available but the phrase suggester is a bit difficult to tune effectively. Have a look at how it is tuned here for a starting place.

Topic		Replies	Views
Fuzzy match query unexpected results Elasticsearch	3	1345	July 5, 2017
Searching by ngrams Elasticsearch elastic-stack-monitoring	10	244	June 16, 2023
Achieve autocomplete with fuzziness Elasticsearch	1	333	March 28, 2019
Fuzzy query don't working as expected Elasticsearch	3	659	March 9, 2023
Fuzziness and analysis Elasticsearch	1	457	March 9, 2018

How to improve search results with fuzziness on?

Related topics