Dictionary decompounder: only longest match doesn't work

Hi I have problem with analyzer: dictionary decompounder. I need the analyzer output as the longest word that match the dictionary.
So, I set the word_list: theredapple, redapple, apple
and I also set only_longest_match: true as follows:

{

    "settings": {

        "analysis": {

      "analyzer": {

        "standard_dictionary_decompound": {

          "tokenizer": "whitespace",

          "filter": [ "22_char_dictionary_decompound", "lowercase" ]

        }

      },

      "filter": {

        "22_char_dictionary_decompound": {

          "type": "dictionary_decompounder",

          "word_list": ["theredpple","redapple", "apple"],

          "max_subword_size": 50,

          "only_longest_match":true

        }

      }

    }

    }

}

Then I test the custom analyzer on URL/_analyze
with

{

    "text":"theredapple",

    "analyzer":"standard_dictionary_decompound"

}

The result was:

{

    "tokens": [

        {

            "token": "theredapple",

            "start_offset": 0,

            "end_offset": 11,

            "type": "word",

            "position": 0

        },

        {

            "token": "redapple",

            "start_offset": 0,

            "end_offset": 11,

            "type": "word",

            "position": 0

        },

        {

            "token": "apple",

            "start_offset": 0,

            "end_offset": 11,

            "type": "word",

            "position": 0

        }

    ]

}

Which is not what I expect. I think if I set only_longest_match:true , the result might be only theredapple, not redapple and apple

Any suggestion about set only_longest_match:true ?. Or I misunderstand something?

Thank for your help.

That's a good question. I asked around and learned that this means only use the longest match for a given start point. theredapple and redapple start at different positions so are assumed different tokens, where as redapple and redpear would only return the longest tokens.

Hope that helps.

1 Like

Ohhh I got it Thank youu

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.