I am working on index where I need to provide results even for one letter search. Default tokenizer is currently set to ngram with range from 1 to 5. My search query is running across multiple fields and for some entries it scores higher documents that have matches on multiple fields with bunch of unigrams and bigrams before higher ngrams (3,4,5). It is often with little margin but still, it is higher.
Character of my search is that when looking for "eti" it is much better to get higher results like "blablaetiblabla" than "blaeibtiblaibe" on multiple fields (it is hard to provide example, since the margin is often low and it works only for some combination of data).
How to deal with this problem? I can use fields with shingles as well, but than I can boost only whole words. I was thinking about having default tokenizer with ngrams 1-2 and second field with ngrams 3-5 with boosting second one in query, but this seems kind a fishy to me. Best would be to have >=3 minimum ngram tokenizer with the ability to search with one or two characters (if no results for 3 and higher has been found), but I am not sure if it is even possible.