Inaccurate results using fuzzy queries


#1

The results of my query should be ordered as follows:

  • 1: query term matches exactly one word
  • 2: query term exactly matches a part of a word
  • 3: unexact matches using Levenshtein distance
  • 4: other unexact matches

Until now I covered requirements 1/3/4 with the following query:

# query 1
GET /articles/article/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "description": {
              "query": "waschbecken",
              "operator": "and",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "description": {
              "query": "waschbecken",
              "fuzziness": "AUTO",
              "operator": "and"
            }
          }
        }
      ]
    }
  }
}

Using a boost of 2 for the first query clause ensures the exact matches to be scored higher than results returned by the second query clause using fuzziness. Using this query with "handwaschbecken" or the erroreneous word "handwaschbeken" both returned accurate results like this:

  • Handwaschbecken
  • Waschbeckenunterschrank Leon
  • Waschbeckendusche Universal Weiss
  • Handwaschbecken Nau Weiss
  • Jan Waschbecken-Unterschrank
  • VB Omnia Handwaschbecken

Now I also need to cover requirement 2 (exactly match a word where the query term is a part of it). To solve this issue I copied the "description" field to the field "description_ngram" and applied the same analyzer but added a trigram filter.

I use the following query and try to use the field "description_ngram" to cover the inexact results.

# query 2
GET /articles/article/_search
{
	"query": {
		"bool": {
			"should": [
		  {
				"match": {
					"description": {
						"query": "waschbecken",
						"operator": "and",
						"fuzziness": "0",
						"boost": 2
					}
				}
			},
			{
				"match": {
					"description_ngram": {
						"query": "waschbecken",
						"operator": "and",
						"fuzziness": "0"
					}
				}
			}]
		}
	}
}

Requirements 1 and 2 are covered, but I have problems to find the accurate inexact matches using e.g. the term "waschbeken" in the query. I would expect to get results similar to the ones listed above.

For “waschbeken” query 1 returns results like:

  • Fliesen-Rollen-Waschset
  • Nigrin-Waschset
  • Fliesen-Waschset
  • sauberlaufmatte Fame home braun, waschbar
  • Jan Waschbecken-Unterschrank
  • comatte Fashion Holzkiste waschbar

For “waschbeken” query 2 returns results like:

  • Qualitaets-Waschtischbefestig.
  • Schallschutz-Set f.Waschtisch
  • Waschbeckendusche Universal Weiss
  • Kon.Waschbeckendichtung Weiss
  • Waschtischbefestigung Oase
  • Lilly Waschbeckenunterschrank

Is there a possibility to get better unexact matches? E.g. when searching for "waschbeken" instead of "waschbecken".


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.