I'm creating a index that contains both small and long words (more then 18)
characters.
Currently i am using a ngram token filter with a minimum of 1 and a maximum
of 20.
Words that have more then 20 characters will not match because they are not
fully indexed.
It is possible to index those words 'untouched'. But then i still cannot
get matches for substrings with a length between 20 and the full-word
length.
I've considered to increase the ngram maximum to 50, but i think that will
bring down the performance and theoretically does not solve the problem (i
cannot think of any, but words may have more then 50 characters).
I'm looking for the 'best' solution to tackle this problem that keeps my
performance intact and allows me to find any long word with any random
substring.
Thanks.