Generating same token for related words

RabBit_BR · January 26, 2024, 9:15pm

Hello everybody.

I would like to know if it is possible using analyzer to generate the same token for the following words: "bronzeadora", "bronze", "bronzeado". The token I need for the three words would be "bronz".

I tried using the stemmer filter but I was unsuccessful as you can see:

GET _analyze
{
  "text": [
    "bronzeadora",
    "bronze",
    "bronzeado"
  ],
  "tokenizer": "standard",
  "filter": [
    {
      "type": "stemmer",
      "language": "brazilian"
    }
  ]
}

Token:

{
  "tokens": [
    {
      "token": "bronzeador",
      "start_offset": 0,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "bronz",
      "start_offset": 12,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 101
    },
    {
      "token": "bronze",
      "start_offset": 19,
      "end_offset": 28,
      "type": "<ALPHANUM>",
      "position": 202
    }
  ]
}

I know I can solve the problem with synonyms but I wanted to make sure there isn't some other filter.

system · February 23, 2024, 9:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Analyzer: Problem when generating tokens Elasticsearch	1	385	October 7, 2019
Analyzer and search_analyzer for common tokens Elasticsearch	1	465	November 14, 2017
Multiple analyzers with stemmed synonyms Elasticsearch	3	635	July 15, 2020
New language - Custom analyzer plugin or token filter Elasticsearch	1	541	March 21, 2017
How to add an analyzer that can remove duplicate tokens from the analyzed field? Elasticsearch	1	196	January 25, 2023

Generating same token for related words

Related topics