Deprecation: Deprecated big difference between max_gram and min_gram in NGram

jzlang · March 7, 2018, 9:50pm

We have a use case where we have to search a field (displayName) whose value can contain multiple words. For instance - "displayName":"Once Upon a Midnight Dreary"
We have a requirement that auto complete functionality must work when searching any one of the words in the value, including part of a word. For instance, querying on "night" will find the above value. This means we cannot use edge_ngram, which would only find the value if searching on "Once", or "Upo" or "Midni", etc.

We've been using the following autocomplete_filter mapping for quite some time now which works great and meets our searching requirements:
"analysis": { "filter": { "autocomplete_filter": { "type": "ngram", "min_gram": 1, "max_gram": 10 } }

However, I noticed recently that when posting this mapping to a new index in Elasticsearch, I get the warning -
"Deprecation: Deprecated big difference between max_gram and min_gram in NGram" and states that the interval between min_gram and max_gram needs to be <=1.

If this is being deprecated, what is an alternative solution that would satisfy all of our requirements, notably the examples above? edge_ngram will not work, ngram with min = 2 and max=3 will not work, as if I type "night" the above title is not found.

Also, when and what version will this be an actual breaking change? Will existing index mappings defined in this manner continue to work, but new mappings defined this way not work? How can we retain this functionality without having to reinvent the wheel?

Thank you.

Ram_Ibaset · March 21, 2018, 12:18am

I have the same question and could not find an answer any where. It looks like someone had used a massive diff and it resulted in the deprecation. However, I would also like to get an answer since I am in the same quandary as you. Did you get an answer?

jzlang · March 21, 2018, 4:30pm

I haven't found any answers or received any recommendations. We convinced the business to use edge_ngram for now for our use case though, as it at least allows a starts with search for each word in a phrase.

Ram_Ibaset · March 21, 2018, 4:32pm

Thanks. I feel that this will be a problem and posted a comment on the GIT issues. Our mandate is to be able to do wildcard like searches and one of the reasons we moved away is because SQL text searches bog the DBMS down. This limitation should be down voted and instead the developer uses his/her discretion. I don't see an option and I cant get any answers either.

jzlang · March 21, 2018, 4:50pm

There is another option I considered trying but I have not tried it yet and not even sure it is possible -
using an edge_ngram analyzer of 2-10 or even 2-20 which would handle the "starts with" searching of each word, and adding an ngram analyzer of 2-3 to handle "contains" and "ends-with" searching of 2-3 characters. Not sure if it is possible to have two analyzers on one field, but it may be worth a try.

_markus · March 22, 2018, 2:54pm

I've changed the index setting max_ngram_diff as documented at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html to "fix" that deprecation message.

jzlang · March 22, 2018, 4:40pm

Thanks Markus. Will try that if we need to go back to ngram for business reasons.

system · April 19, 2018, 4:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query returning false results when term exceeds ngram length Elasticsearch	6	1614	January 16, 2018
Elasticsearch highlighting on ngram filter is wrwong if min_gram is set to 1 Elasticsearch	2	798	July 6, 2017
edgeNGram minimum length omits shorter words Elasticsearch	12	2865	July 6, 2017
EdgeNGram at search time Elasticsearch	2	390	April 24, 2017
Illegal_argument_exception while creating new index Elasticsearch	2	1576	March 12, 2020

Deprecation: Deprecated big difference between max_gram and min_gram in NGram

Related topics