Deprecation: Deprecated big difference between max_gram and min_gram in NGram

We have a use case where we have to search a field (displayName) whose value can contain multiple words. For instance - "displayName":"Once Upon a Midnight Dreary"
We have a requirement that auto complete functionality must work when searching any one of the words in the value, including part of a word. For instance, querying on "night" will find the above value. This means we cannot use edge_ngram, which would only find the value if searching on "Once", or "Upo" or "Midni", etc.

We've been using the following autocomplete_filter mapping for quite some time now which works great and meets our searching requirements:
"analysis": { "filter": { "autocomplete_filter": { "type": "ngram", "min_gram": 1, "max_gram": 10 } }

However, I noticed recently that when posting this mapping to a new index in Elasticsearch, I get the warning -
"Deprecation: Deprecated big difference between max_gram and min_gram in NGram" and states that the interval between min_gram and max_gram needs to be <=1.

If this is being deprecated, what is an alternative solution that would satisfy all of our requirements, notably the examples above? edge_ngram will not work, ngram with min = 2 and max=3 will not work, as if I type "night" the above title is not found.

Also, when and what version will this be an actual breaking change? Will existing index mappings defined in this manner continue to work, but new mappings defined this way not work? How can we retain this functionality without having to reinvent the wheel?

Thank you.

I have the same question and could not find an answer any where. It looks like someone had used a massive diff and it resulted in the deprecation. However, I would also like to get an answer since I am in the same quandary as you. Did you get an answer?

I haven't found any answers or received any recommendations. We convinced the business to use edge_ngram for now for our use case though, as it at least allows a starts with search for each word in a phrase.

Thanks. I feel that this will be a problem and posted a comment on the GIT issues. Our mandate is to be able to do wildcard like searches and one of the reasons we moved away is because SQL text searches bog the DBMS down. This limitation should be down voted and instead the developer uses his/her discretion. I don't see an option and I cant get any answers either.

There is another option I considered trying but I have not tried it yet and not even sure it is possible -
using an edge_ngram analyzer of 2-10 or even 2-20 which would handle the "starts with" searching of each word, and adding an ngram analyzer of 2-3 to handle "contains" and "ends-with" searching of 2-3 characters. Not sure if it is possible to have two analyzers on one field, but it may be worth a try.

I've changed the index setting max_ngram_diff as documented at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html to "fix" that deprecation message.

Thanks Markus. Will try that if we need to go back to ngram for business reasons.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.