How to use the root of Spanish words for searches?

I need to search my documents by words, I want to obtain all the documents related to a word and its derivations in Spanish, for example: if I search for "flor" I want to obtain all the documents related to the words: "florero", "florar", "florecer", "florista" , "florido", "enflorar".

I was using Stemmer and snowball filters configured for Spanish language, but I don't get good results. For example:

Config:

"filter" : {
            "my_snowball" : {
              "type" : "snowball",
              "language" : "spanish"
            }
}

Test:

GET test4/_analyze
{
  "tokenizer": "whitespace",
  "filter": [ "my_snowball" ],
  "text": "florero florar florecer florista florido"
}

Result:

{
  "tokens" : [
    {
      "token" : "florer",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "flor",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "florec",
      "start_offset" : 15,
      "end_offset" : 23,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "florist",
      "start_offset" : 24,
      "end_offset" : 32,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "flor",
      "start_offset" : 33,
      "end_offset" : 40,
      "type" : "word",
      "position" : 4
    }
  ]
}

and stemmer:
Config:

"filter" : {
            "my_stemmer_spanish" : {
              "type" : "stemmer",
              "language" : "spanish"
            }
}

Test:

GET test4/_analyze
{
  "tokenizer": "whitespace",
  "filter": [ "my_stemmer_spanish" ],
  "text": "florero florar florecer florista florido"
}

Result:

"tokens" : [
    {
      "token" : "florer",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "flor",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "florec",
      "start_offset" : 15,
      "end_offset" : 23,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "florist",
      "start_offset" : 24,
      "end_offset" : 32,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "flor",
      "start_offset" : 33,
      "end_offset" : 40,
      "type" : "word",
      "position" : 4
    }
  ]
}

Hi, I think you meant to post this in the Elasticsearch category of discussions: Elasticsearch - Discuss the Elastic Stack

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.