Does EdgeNGram Tokenizer internally optimize for a finite set of values regardless of document count?

Nithin_Chandy · August 9, 2017, 4:47pm

If there are only few possible values for a field, lets say max 20 possible values for a field, is it a good idea to use edgeNgram tokenizer for autocompletion?

Just curious if EdgeNGram Tokenizer has any advantage over other tokenizers especially if the possible values are finite.

I assume since the values are finite, it won't consume that much disk space for ngrams regardless of the number of documents and can easily fit in memory (assuming limited number of fields in each doc). Is this true?

jpountz · August 11, 2017, 8:06am

If there are only 20 values in total, I would probably use a prefix query directly, since the prefix query will be rewritten to a bool query that has 20 terms at most, which is very reasonable.

Your assumption about the cost of edge ngrams being lower when the number of unique values is contained is correct.

Nithin_Chandy · August 18, 2017, 9:15pm

Thank you

system · September 15, 2017, 9:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NGram Index implications Elasticsearch	5	841	July 5, 2017
Ngram and edgeNgram combined for _all field; or different token filters per field for _all Elasticsearch	1	582	July 6, 2017
Edge_ngram tokenizer and edge_ngram filter don't behave the same? Elasticsearch	1	356	December 30, 2020
Query for automplete feature? Elasticsearch	2	328	February 19, 2020
Issue with Edge NGram Tokenizer in elastic search Elasticsearch	2	649	January 13, 2017

Does EdgeNGram Tokenizer internally optimize for a finite set of values regardless of document count?

Related topics