I use Elasticsearch 7.17.4 with docker and hunspell for russian language.
My settings for index analysis:
"analysis": {
"filter": {
"my_stemmer": {
"type": "stemmer",
"language": "russian"
},
"ru_RU": {
"locale": "ru_RU",
"type": "hunspell"
}
},
"analyzer": {
"custom_analyzer": {
"filter": [
"lowercase",
"ru_RU",
"my_stemmer"
],
"char_filter": [
"html_strip"
],
"tokenizer": "standard"
}
}
}
Unfortunately FTS does not work properly with all words: for example I have the following entries:
новый колодец
нового колодца
новому колодцу
новым колодцем
новом колодце
When I make the following request:
GET http://localhost:9200/ingredient/_search?pretty
Content-Type: application/json
{
"query": {
"query_string": {
"query": "колодец",
"default_field": "name"
}
}
}
I get the following result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1.8472799,
"hits": [
{
"_index": "ingredient",
"_type": "_doc",
"_id": "35",
"_score": 1.8472799,
"_source": {
"name": "новый колодец",
"id": 35,
"_meta": {}
}
},
{
"_index": "ingredient",
"_type": "_doc",
"_id": "36",
"_score": 1.8472799,
"_source": {
"name": "нового колодца",
"id": 36,
"_meta": {}
}
},
{
"_index": "ingredient",
"_type": "_doc",
"_id": "37",
"_score": 1.8472799,
"_source": {
"name": "новому колодцу",
"id": 37,
"_meta": {}
}
},
{
"_index": "ingredient",
"_type": "_doc",
"_id": "39",
"_score": 1.8472799,
"_source": {
"name": "новом колодце",
"id": 39,
"_meta": {}
}
}
]
}
}
Response code: 200 (OK); Time: 65ms; Content length: 1235 bytes
I never get "новым колодцем".
My question is: Is it the problem with my hunspell? Should I find any version of it with more data? Or is it a problem with my settings? Maybe I missed something?