Ngrams - choosing the value of N

I need to perform substring search on the content of files. I decided to go with the NGrams approach - index the entire content of all the files as ngrams (I am using an ngram tokenizer) and then query them using a match phrase query. So, if I index using n=3, then search text "client" would look for a document containing cli, ien, ent in that order.
I understand that indexing ngrams will definitely lead to an increase in index size, but my concern now is to choose a value of n which leads to the lowest possible increase in index size.
Intuitively when I think of it, n=1 or n=2 would form the least possible number of grams. False matches are not a concern for me since running a match phrase query takes care of that. What would be the drawbacks of picking a very low value of n?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.