Problem with Search-time Synonyms

I have an index with synonyms :

"index": {
  "analysis": {
    "analyzer": {
      "index_analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "my_stemmer"
        ]
      },
      "search_analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "synonym_filter",
          "my_stemmer"
        ]
      }
    },
    "filter": {
      "synonym_filter": {
        "type": "synonym_graph",
        "synonyms_path": "/app/config/synonyms.txt",
        "updateable": True
      },
      "my_stemmer": {
        "type": "stemmer",
        "language": "light_english"
      }
    }
  }
}

with this mapping

"mappings": {
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "index_analyzer",
      "search_analyzer": "search_analyzer"
    }
 }

as you can see, I only use the synonyms filter in search-time.
now let's say I have a synonym like this:
nana, grammy => grandma
now when I search for nana the search_analyzer replaces nana with grandma and only returns documents that contain the word grandma and it completely ignores nana.
Do you have any suggestions on what I should do? Should I add synonyms to index-time?

Hi @elleWajexi,

Welcome! Your specified synonym maps all of the terms on the left (in this case nana and grammy to grandma, which is why only grandma is searched for.

Have you tried using an equivalent synonym listing, nana, grammy, grandma which will expand out instead of map to grandma?

1 Like

I did try this, one problem I have with using equivalent synonyms is, using it with multi-word synonyms.
for example in nana, grammy, grandma, grand mother
when I search for nana returns these tokens:

{
  "tokens": [
    {
      "token": "nana",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "grammy",
      "start_offset": 0,
      "end_offset": 4,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "grandma",
      "start_offset": 0,
      "end_offset": 4,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "grand",
      "start_offset": 0,
      "end_offset": 4,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "mother",
      "start_offset": 0,
      "end_offset": 4,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

this way if there's a document that has multiple keywords of grand or mother could get a higher score and return irrelevant results.

Can you explain what result you want in the grand mother case? Should the additional synonym not be grandmother all one word?

1 Like

I want if someone searches for grandmother it also includes misspellings like grand mother

Could you try a mapping synonym alongisde your expansion rule? Perhaps something like the below:

GET /_analyze
{
  "tokenizer": "standard",
  "filter" : [
    "lowercase",
    {
      "type": "synonym_graph",
      "synonyms": ["grand mother => nana, grammy, grandma, grandmother", "nana, grammy, grandma, grandmother"]
    }
  ],
  "text" : "Looking for my grand mother"
}

Resulting tokens:

{
  "tokens": [
    {
      "token": "looking",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "for",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "my",
      "start_offset": 12,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "nana",
      "start_offset": 15,
      "end_offset": 27,
      "type": "SYNONYM",
      "position": 3
    },
    {
      "token": "grammy",
      "start_offset": 15,
      "end_offset": 27,
      "type": "SYNONYM",
      "position": 3
    },
    {
      "token": "grandma",
      "start_offset": 15,
      "end_offset": 27,
      "type": "SYNONYM",
      "position": 3
    },
    {
      "token": "grandmother",
      "start_offset": 15,
      "end_offset": 27,
      "type": "SYNONYM",
      "position": 3
    }
  ]
}
1 Like

hi, thanks for the reply.
I ended up doing the same thing, adding
"grand mother => nana, grammy, grandma, grandmother" mapping.

the only issue with this is if someone searches for grand mother it's gonna ignore documents with grand mother in them to fix this I had to add
grand mother => nana, grammy, grandma, grandmother to the index time too.

1 Like

Glad you got something working in the end. Thanks for sharing you final approach @elleWajexi!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.