Part-of-word matching challenge


(Maarten Roosendaal) #1

Hi,

We're looking for a way to match on part of words for single term queries but not all words.

For example (it's dutch):

  • searching for 'verlichting' should match on 'tuinverlichting' or 'wandverlichting' but
  • searching for 'pop' should NOT match on 'popular'
  • searching for 'bed' should match on 'beddengoed' or 'dekbedden' but NOT on 'bedrading' or bedrijf'

What is the best strategy?

Maarten


(Jörg Prante) #2

One method is to implement a word decompounding token filter for dutch language.


(Maarten Roosendaal) #3

Hi,

Thanks for your reply. I've looked at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-compound-word-tokenfilter.html#_dictionary_decompounder but the documentation is lacking.

I've tried, see https://gist.github.com/anonymous/f6f3067b02af50928751127a1e351e63
but somehow it does not work at all.

Maybe you can spot the errror? I'm using es 5.2.

Is this what you meant by the way?

Thanks,
Maarten


(Maarten Roosendaal) #4

figured out the problem of decompounding, had to do because i put the quotes around the [ in stead of only the words. And in combination with stemmer_override i could manually override words for stemming i got better results.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.