Does EdgeNGram Tokenizer internally optimize for a finite set of values regardless of document count?

If there are only few possible values for a field, lets say max 20 possible values for a field, is it a good idea to use edgeNgram tokenizer for autocompletion?

Just curious if EdgeNGram Tokenizer has any advantage over other tokenizers especially if the possible values are finite.

I assume since the values are finite, it won't consume that much disk space for ngrams regardless of the number of documents and can easily fit in memory (assuming limited number of fields in each doc). Is this true?

If there are only 20 values in total, I would probably use a prefix query directly, since the prefix query will be rewritten to a bool query that has 20 terms at most, which is very reasonable.

Your assumption about the cost of edge ngrams being lower when the number of unique values is contained is correct.

1 Like

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.