I need to search my documents by words, I want to obtain all the documents related to a word and its derivations in Spanish, for example: if I search for "flor" I want to obtain all the documents related to the words: "florero", "florar", "florecer", "florista" , "florido", "enflorar".
I was using Stemmer and snowball filters configured for Spanish language, but I don't get good results. For example:
Config:
"filter" : {
"my_snowball" : {
"type" : "snowball",
"language" : "spanish"
}
}
Test:
GET test4/_analyze
{
"tokenizer": "whitespace",
"filter": [ "my_snowball" ],
"text": "florero florar florecer florista florido"
}
Result:
{
"tokens" : [
{
"token" : "florer",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "flor",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 1
},
{
"token" : "florec",
"start_offset" : 15,
"end_offset" : 23,
"type" : "word",
"position" : 2
},
{
"token" : "florist",
"start_offset" : 24,
"end_offset" : 32,
"type" : "word",
"position" : 3
},
{
"token" : "flor",
"start_offset" : 33,
"end_offset" : 40,
"type" : "word",
"position" : 4
}
]
}
and stemmer:
Config:
"filter" : {
"my_stemmer_spanish" : {
"type" : "stemmer",
"language" : "spanish"
}
}
Test:
GET test4/_analyze
{
"tokenizer": "whitespace",
"filter": [ "my_stemmer_spanish" ],
"text": "florero florar florecer florista florido"
}
Result:
"tokens" : [
{
"token" : "florer",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "flor",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 1
},
{
"token" : "florec",
"start_offset" : 15,
"end_offset" : 23,
"type" : "word",
"position" : 2
},
{
"token" : "florist",
"start_offset" : 24,
"end_offset" : 32,
"type" : "word",
"position" : 3
},
{
"token" : "flor",
"start_offset" : 33,
"end_offset" : 40,
"type" : "word",
"position" : 4
}
]
}