How can I handle typos in synonyms?

gennadii · May 23, 2023, 4:38pm

I have synonyms in synonyms.txt - "auto, vehicle => car".

In index I have a document with string "car" and an analyzer to handle synonyms.

When you use "auto", for example, it will also return you results for "car". But when I have a typo in the synonym something like "vhicle" or "apto" it doesn't recognize the synonym and as a result original document value "car".

I tried to apply fuzziness, but it only applies to the original value that I have in the index and handles typos in "car" but not in synonyms. So either exact match for the synonym or a fuzzy query for the original string "car" works.

Here is my analyzer and the filter:

"analyzer": {
      "c_analyzer": {
        "tokenizer": "standard",
        "filter": ["lowercase", "synonym"]
      }
    },
    "filter": {
      "synonym": {
        "type": "synonym",
        "synonyms_path": "synonyms.txt"
      }
}

and then the query:

match: {
  word: {
    query: 'apto',
    operator: 'OR',
    fuzziness: 'auto',
    boost: 9,
    analyzer: 'c_analyzer'
  },
}

Can't find any info so really sad about it So any help would be useful.

RabBit_BR · May 23, 2023, 7:21pm

Hi @gennadii

One solution is to index all synonym terms like "auto" and "vehicle". That way you can use Fuzzy.

PUT idx_synonyms
{
  "mappings": {
    "properties": {
      "store": {
        "type": "text",
        "analyzer": "synonym"
      }
    }
  }, 
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [ "lowercase", "synonym" ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [ "auto, vehicle, car" ]
          }
        }
      }
    }
  }
}

POST idx_synonyms/_doc
{
  "store": "car"
}

Query

GET idx_synonyms/_search
{
  "query": {
    "match": {
      "store": {
        "query": "apto",
        "fuzziness": "AUTO"
      }
    }
  }
}

gennadii · May 23, 2023, 8:29pm

Thank you for the answer. Works perfect. But for some reasons it doesn't work when trying to apply synonyms from file:

"filter": {
  "synonym": {
    "type": "synonym",
    "synonyms_path": "synonyms.txt"
  }
}

Synonyms itself work when loading them from file, but the query part doesn't apply fuzziness on a synonym.

RabBit_BR · May 23, 2023, 8:49pm

Did you change the data in the synonym file?

gennadii · May 23, 2023, 8:51pm

I updated the file to have a format:

car, auto => vehicle

and uploaded it as an update of bundle to elastic

gennadii · May 23, 2023, 9:50pm

UPD: with this solution there is also another issue. If in the index I have "cars" and use "auto" in query it can't find "cars". How can it be solved? So I want to find something with the synonym even if there is a typo/another form of the word in the synonym or in the original document.

RabBit_BR · May 23, 2023, 10:34pm

I believe this way will not work.

Add a stemmer filter in the language you want, in this example English. That way you get car and cars.

analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [
              "stemmer",
              "lowercase",
              "synonym"
            ]
          }
        },
        "filter": {
          "my_stemmer": {
            "type": "stemmer",
            "language": "english"
          },
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "auto, vehicle, car"
            ]
          }
        }
      }

gennadii · May 23, 2023, 10:41pm

Great, thank you so much. The only thing is I am struggling with is typos. If you have let's say in index "vhicles" and in query "cars", it will not find "vhicles".

RabBit_BR · May 23, 2023, 11:34pm

do you index a wrong word?

gennadii · May 24, 2023, 5:53am

Yes, typo should be handled on both sides. So if I have "vhicles" in index and a typo "aptos" in query, it should first find the synony, "auto", then convert it to "vehicle" and then use something like fuzzy search to find "vhicles". At least that is how I built it in my head Is it even possible?

RabBit_BR · May 24, 2023, 1:00pm

For me that doesn't make sense. Index misspelled words? As I showed earlier, you can index the synonyms and use fuzzy when the search term is wrong.

gennadii · May 24, 2023, 1:38pm

A user can create a record and this record goes to the index. That is why it can contain a typo. So "vhicle" in index should be found with a query "car", the same way as "car" in index with query "vhicle"

RabBit_BR · May 24, 2023, 2:54pm

this sounds very strange to me. I would not index spelling errors, what you want from my point of view is not correct (you index the wrong term and want to use fuzzy to match the right term with the wrong term that is indexed).
I would review this requirement.

system · June 21, 2023, 2:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonyms and Fuzziness conflict Elasticsearch	6	2566	August 14, 2017
Can synonym analyzer or Fuzzy queries return the token that it got matched to from document? Elasticsearch	1	98	October 6, 2023
Synonyms in synonyms.txt not recognized Elasticsearch	37	7047	February 5, 2018
Synonym doesn't work Elasticsearch	1	413	July 6, 2017
Can't get n-grams and synonyms to work together Elasticsearch	4	771	November 8, 2019

How can I handle typos in synonyms?

Related topics