Need Hunspell Analysis in Single Token Keyword

Stephanus_Budiwijaya · December 19, 2019, 3:07am

ES version: 5.6.3

Problem:
I have a field with mappings:

"keyword_phrase_tag": {
    "type": "text",
    "analyzer": "keyword_phrase",
    "search_analyzer": "phrase_keyword_analyzer"
},

and then the keyword_phrase itself is

"keyword_analyzer": {
    "filter": [
      "lowercase",
      "english_stemmer"
    ],
    "type": "custom",
    "tokenizer": "keyword"
},

If the keyword_phrase_tag value is "newest Adidas shoes", my expectation result is "new adidas shoes" but I got "newest adidas shoes".
It turns out the tokenizer keyword ignore the Hunspell analyzer that implemented in english_stemmer filter

"english_stemmer": {
    "locale": "en_US",
    "type": "hunspell",
    "dedup": "true"
}

I can confirm that it is not wrong Hunspell implementation. If try to change the tokenizer to standard, the Hunspell works but split into multiple token


{
  "tokens": [
    {
      "token": "new",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "adidas",
      "start_offset": 7,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "shoe",
      "start_offset": 14,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

Question
Is there a way to implement Hunspell to single token output?

system · January 16, 2020, 3:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hunspell analyzer Elasticsearch	3	745	July 5, 2017
Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin Elasticsearch	13	1053	July 6, 2017
Hunspell filter problem Elasticsearch	5	821	July 5, 2017
Hunspell russian language. Problem with some grammar cases Elasticsearch docker	2	343	July 18, 2022
Elasticsearch Analyzer:Stemmer giving different results Elasticsearch	1	375	February 6, 2019

Need Hunspell Analysis in Single Token Keyword

Related topics