It is possible to use other stemmers for nonenglish languages?

udl · April 15, 2024, 8:13pm

Well, I am working in a proyect with Spanish text, summing up, none of the stemmers that I have seen in the documentation for Spanish give me good results (only 2, snowball and the normal one), to give an example.

{
  "tokenizer": "standard",
  "filter": [ 
    {
      "type": "snowball",
      "language": "spanish"
    }
  ],
  "text": "alimento, alimentacion"
}

The previous query returns the following:

{
  "tokens" : [
    {
      "token" : "aliment",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "alimentacion",
      "start_offset" : 10,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

When clearly alimento and alimentacion should have the same root, is there a way to look for other stemmers?