Hi I have problem with analyzer: dictionary decompounder. I need the analyzer output as the longest word that match the dictionary.
So, I set the word_list: theredapple, redapple, apple
and I also set only_longest_match: true as follows:
{
"settings": {
"analysis": {
"analyzer": {
"standard_dictionary_decompound": {
"tokenizer": "whitespace",
"filter": [ "22_char_dictionary_decompound", "lowercase" ]
}
},
"filter": {
"22_char_dictionary_decompound": {
"type": "dictionary_decompounder",
"word_list": ["theredpple","redapple", "apple"],
"max_subword_size": 50,
"only_longest_match":true
}
}
}
}
}
Then I test the custom analyzer on URL/_analyze
with
{
"text":"theredapple",
"analyzer":"standard_dictionary_decompound"
}
The result was:
{
"tokens": [
{
"token": "theredapple",
"start_offset": 0,
"end_offset": 11,
"type": "word",
"position": 0
},
{
"token": "redapple",
"start_offset": 0,
"end_offset": 11,
"type": "word",
"position": 0
},
{
"token": "apple",
"start_offset": 0,
"end_offset": 11,
"type": "word",
"position": 0
}
]
}
Which is not what I expect. I think if I set only_longest_match:true , the result might be only theredapple, not redapple and apple
Any suggestion about set only_longest_match:true ?. Or I misunderstand something?
Thank for your help.