How to use the root of Spanish words for searches?

JavierArevalo · August 5, 2021, 4:30pm

I need to search my documents by words, I want to obtain all the documents related to a word and its derivations in Spanish, for example: if I search for "flor" I want to obtain all the documents related to the words: "florero", "florar", "florecer", "florista" , "florido", "enflorar".

I was using Stemmer and snowball filters configured for Spanish language, but I don't get good results. For example:

Config:

"filter" : {
            "my_snowball" : {
              "type" : "snowball",
              "language" : "spanish"
            }
}

Test:

GET test4/_analyze
{
  "tokenizer": "whitespace",
  "filter": [ "my_snowball" ],
  "text": "florero florar florecer florista florido"
}

Result:

{
  "tokens" : [
    {
      "token" : "florer",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "flor",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "florec",
      "start_offset" : 15,
      "end_offset" : 23,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "florist",
      "start_offset" : 24,
      "end_offset" : 32,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "flor",
      "start_offset" : 33,
      "end_offset" : 40,
      "type" : "word",
      "position" : 4
    }
  ]
}

and stemmer:
Config:

"filter" : {
            "my_stemmer_spanish" : {
              "type" : "stemmer",
              "language" : "spanish"
            }
}

Test:

GET test4/_analyze
{
  "tokenizer": "whitespace",
  "filter": [ "my_stemmer_spanish" ],
  "text": "florero florar florecer florista florido"
}

Result:

"tokens" : [
    {
      "token" : "florer",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "flor",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "florec",
      "start_offset" : 15,
      "end_offset" : 23,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "florist",
      "start_offset" : 24,
      "end_offset" : 32,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "flor",
      "start_offset" : 33,
      "end_offset" : 40,
      "type" : "word",
      "position" : 4
    }
  ]
}

tsullivan · August 12, 2021, 6:13pm

Hi, I think you meant to post this in the Elasticsearch category of discussions: Elasticsearch - Discuss the Elastic Stack

system · September 9, 2021, 6:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
It is possible to use other stemmers for nonenglish languages? Elasticsearch	0	92	April 15, 2024
Get stem for word in elastic Elasticsearch	5	1733	July 5, 2017
Keywords with spaces and root word stemming Elasticsearch	6	1724	July 5, 2017
Stemming Elasticsearch	2	644	July 6, 2017
Getting all words used in a document matching a stemmed query Elasticsearch	1	421	July 5, 2017

How to use the root of Spanish words for searches?

Related topics