Compound Word Token Filter

(nachiket) #1

Is compound word token filter useful for foreign languages only or can
be useful in English too?
My application takes as an input a char sequence. My job is to
understand the content. The content is mostly proper nouns though not
necessarily. Can I build my own dictionary or something where I first
check if the word is present and if not I could append it on
I have looked at Synonyms token filter. But I didn't understand where
Compound Word token filter could be used.
Can somebody help on this issue?


(system) #2