Multiple analyzers with stemmed synonyms

Kushikawa · June 16, 2020, 2:27pm

I want to create an index with a stemmer analyzer to generalize my synonyms and apply it in other analyzers.
For a simplified example: I want to use all these synonyms [beautiful, pretty, beauteous, gorgeous] in multiple analyzers when searching for beauty, once beauty and beautiful have the same stem word

GET /_analyzer
{
  "tokenizer": "standard",
  "filter": [ "stemmer" ],
  "text": "beautiful beauty"
}
{
  "tokens": [
    {
      "token": "beauti",  ...
    },
    {
      "token": "beauti", ...
    }
  ]
}

What I have so far is

PUT /test_synonyms
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonyms": {
            "type": "synonym",
            "synonyms":  ["beautiful, pretty, beauteous, gorgeous"]
          },
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": true
          }
        }
      }
    }
  }
}

Stemmer analyzer gives me:

GET /test_synonyms/_analyzer
{
  "tokenizer": "standard",
  "filter": ["stemmer", "my_synonyms"],
  "text": "beauty"
}
{
  "tokens": [
    {
      "token": "beauti",  ...
    },
    {
      "token": "pretti", ...
    },
    {
      "token": "beauteo", ...
    },
    {
      "token": "gorgeou", ...
    }
  ]
}

Phonetic analyzer gives me:

GET /test_synonyms/_analyzer
{
  "tokenizer": "standard",
  "filter": ["my_synonyms", "my_metaphone"],
  "text": "beauty"
}
{
  "tokens": [
    {
      "token": "BT",  ...
    }
  ]
}

Once "BT" doesn't match with any of the tokens:

GET /test_synonyms/_analyzer
{
  "tokenizer": "standard",
  "filter": ["my_synonyms", "my_phonetic"],
  "text": "beautiful"
}
{
  "tokens": [
    {
      "token": "BTFL",  ... /*beautiful*/
    },
    {
      "token": "PRT", ... /*pretty*/
    },
    {
      "token": "BTS", ... /*beauteous*/
    },
    {
      "token": "KRJS", ... /*gorgeous*/
    }
  ]
}

I was wondering if there is a way to return the exact synonym words (not their stem), but still use stemmer to find them, and then use this with other analyzers.. Something to give me the response above when searching for beauty

I tried to use the stemmer and phonetic filters together, but it gives me:

GET /test_synonyms/_analyzer
{
  "tokenizer": "standard",
  "filter": ["stemmer", "my_synonyms", "my_phonetic"],
  "text": "beauty" /*or beautiful (equal responses)*/
}
{
  "tokens": [
    {
      "token": "BT", ... /*beauti*/
    },
    {
      "token": "PRT", ... /*pretti*/
    },
    {
      "token": "BT", ... /*beauteou*/
    },
    {
      "token": "KRJ", ... /*gorgeou*/
    }
  ]
}

And this isn't what I really want, cuz when I search for "beautiful" and "beauty", the number of documents returned are differents (beautiful score the phonetic matches), and I want them to be the same.

cbuescher · June 16, 2020, 5:46pm

I don't understand why you would you want to do that? If you index your documents using the stemmer, docs with "beauteous" in the input will have the stemmed version written to the index. When you search them later e.g. via synonym expansion you want the same stemmer being aplied to them, otherwise you will not match the intended documents.

Specifically:

"a gorgeous boat" will index "gorgeou" when using a stemmer.
"beauty" at search time will expand to "gorgeou", otherwise it wouldn't match the document

Am I missing something?

Kushikawa · June 17, 2020, 12:48am

Hi @cbuescher, thank you for your reply. I'm sorry, my final goal was not as simple as I made it look. I'm new to elastic and my problem is related to specific Portuguese cases. I updated my question! Please let me know if it makes a bit more sense now or if I'm going in the wrong direction.

system · July 15, 2020, 12:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using two analyzers stemmer and synonym at a same time Elasticsearch	3	947	July 5, 2017
Specifying the search analyzer for a multi_match query Elasticsearch	3	489	July 6, 2017
Queries for stem words or synonyms dont yield results Elasticsearch	2	509	July 13, 2018
Multiplexer with synonyms doesn't work as expected Elasticsearch	0	88	June 6, 2024
Correctly set up index analyzer and search analyzer Elasticsearch	3	770	May 29, 2021

Multiple analyzers with stemmed synonyms

Related topics