ES version: 5.6.3
Problem:
I have a field with mappings:
"keyword_phrase_tag": {
"type": "text",
"analyzer": "keyword_phrase",
"search_analyzer": "phrase_keyword_analyzer"
},
and then the keyword_phrase
itself is
"keyword_analyzer": {
"filter": [
"lowercase",
"english_stemmer"
],
"type": "custom",
"tokenizer": "keyword"
},
If the keyword_phrase_tag
value is "newest Adidas shoes", my expectation result is "new adidas shoes" but I got "newest adidas shoes".
It turns out the tokenizer keyword ignore the Hunspell analyzer that implemented in english_stemmer
filter
"english_stemmer": {
"locale": "en_US",
"type": "hunspell",
"dedup": "true"
}
I can confirm that it is not wrong Hunspell implementation. If try to change the tokenizer to standard, the Hunspell works but split into multiple token
{
"tokens": [
{
"token": "new",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "adidas",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "shoe",
"start_offset": 14,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Question
Is there a way to implement Hunspell to single token output?