How can I handle typos in synonyms?

I have synonyms in synonyms.txt - "auto, vehicle => car".

In index I have a document with string "car" and an analyzer to handle synonyms.

When you use "auto", for example, it will also return you results for "car". But when I have a typo in the synonym something like "vhicle" or "apto" it doesn't recognize the synonym and as a result original document value "car".

I tried to apply fuzziness, but it only applies to the original value that I have in the index and handles typos in "car" but not in synonyms. So either exact match for the synonym or a fuzzy query for the original string "car" works.

Here is my analyzer and the filter:

"analyzer": {
      "c_analyzer": {
        "tokenizer": "standard",
        "filter": ["lowercase", "synonym"]
      }
    },
    "filter": {
      "synonym": {
        "type": "synonym",
        "synonyms_path": "synonyms.txt"
      }
}

and then the query:

match: {
  word: {
    query: 'apto',
    operator: 'OR',
    fuzziness: 'auto',
    boost: 9,
    analyzer: 'c_analyzer'
  },
}

Can't find any info so really sad about it :sweat_smile: So any help would be useful.

Hi @gennadii

One solution is to index all synonym terms like "auto" and "vehicle". That way you can use Fuzzy.

PUT idx_synonyms
{
  "mappings": {
    "properties": {
      "store": {
        "type": "text",
        "analyzer": "synonym"
      }
    }
  }, 
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [ "lowercase", "synonym" ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [ "auto, vehicle, car" ]
          }
        }
      }
    }
  }
}

POST idx_synonyms/_doc
{
  "store": "car"
}

Query

GET idx_synonyms/_search
{
  "query": {
    "match": {
      "store": {
        "query": "apto",
        "fuzziness": "AUTO"
      }
    }
  }
}

Thank you for the answer. Works perfect. But for some reasons it doesn't work when trying to apply synonyms from file:

"filter": {
  "synonym": {
    "type": "synonym",
    "synonyms_path": "synonyms.txt"
  }
}

Synonyms itself work when loading them from file, but the query part doesn't apply fuzziness on a synonym.

Did you change the data in the synonym file?

I updated the file to have a format:

car, auto => vehicle

and uploaded it as an update of bundle to elastic

UPD: with this solution there is also another issue. If in the index I have "cars" and use "auto" in query it can't find "cars". How can it be solved? So I want to find something with the synonym even if there is a typo/another form of the word in the synonym or in the original document.

I believe this way will not work.

Add a stemmer filter in the language you want, in this example English. That way you get car and cars.

analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [
              "stemmer",
              "lowercase",
              "synonym"
            ]
          }
        },
        "filter": {
          "my_stemmer": {
            "type": "stemmer",
            "language": "english"
          },
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "auto, vehicle, car"
            ]
          }
        }
      }

Great, thank you so much. The only thing is I am struggling with is typos. If you have let's say in index "vhicles" and in query "cars", it will not find "vhicles".

do you index a wrong word?

Yes, typo should be handled on both sides. So if I have "vhicles" in index and a typo "aptos" in query, it should first find the synony, "auto", then convert it to "vehicle" and then use something like fuzzy search to find "vhicles". At least that is how I built it in my head :smile: Is it even possible?

For me that doesn't make sense. Index misspelled words? As I showed earlier, you can index the synonyms and use fuzzy when the search term is wrong.

A user can create a record and this record goes to the index. That is why it can contain a typo. So "vhicle" in index should be found with a query "car", the same way as "car" in index with query "vhicle"

this sounds very strange to me. I would not index spelling errors, what you want from my point of view is not correct (you index the wrong term and want to use fuzzy to match the right term with the wrong term that is indexed).
I would review this requirement.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.