Conditional N-gram token filter?

ehsan_kabiri_33 · February 16, 2021, 5:21pm

Could we have conditional N-gram token filter? Tell me yes or no ?

More explanation : I have a text "The animal is cute"
I want to know if it is possible to have min_gram=3 if word.length>=3
and min_gram=2 if word.length< 3 ?!

Invert-indexing all tokens/words with min_gram=2 could have many cost on server and min_gram=3 results in missing some small words in Persian.
Any hint is appreciable

I would be so thankful if your answer is yes, give me an example.
This is my current code using Nest and I need to know if I can change N-gram token filter for different words

         var suggestionindexResponse = await client.Indices.CreateAsync("suggestionindex", c => c
                .Settings(s => s
                .Setting(UpdatableIndexSettings.MaxNGramDiff, 8)
                .Analysis(a => a
                .CharFilters(cf => cf.Mapping("mycharfilter", cf => cf.Mappings(new[] { "\\u200C=>\\u0020" })))    //convert somthing like half space (Ctrl + -   or shift+ctrl+4) to space                 
                .Analyzers(aa => aa
                .Custom("suggestionanalyzer", ca => ca
                .CharFilters("mycharfilter")
                .Filters(new List<string> { "lowercase", "my_stopword", "mynGram", "decimal_digit", "arabic_normalization", "persian_normalization" })
                .Tokenizer("standard"))
                .Custom("suggestionsearchanalyzer", ca => ca
                .CharFilters("mycharfilter")
                .Filters(new List<string> { "lowercase", "my_stopword", "decimal_digit", "arabic_normalization", "persian_normalization" })
                .Tokenizer("standard"))
                )
                .TokenFilters(tf => tf
                .Stop("my_stopword", st => st.StopWords("_persian_"))
                .NGram("mynGram", td => td
                .MaxGram(10).MinGram(2))
                )
                ))
                .Map<WordSuggestion>(m => m
                  .Dynamic(DynamicMapping.Strict)
                  .AutoMap()
                  .Properties(ps => ps
                  .Text(s => s.Name(n => n.word).Analyzer("suggestionanalyzer").SearchAnalyzer("suggestionsearchanalyzer"))
                  .Number(n => n.Name(i => i.Id).Type(NumberType.Integer))
                 )));

system · March 16, 2021, 5:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ngram token filter omits tokens less than min_gram number Elasticsearch	1	310	July 6, 2017
Phrase suggester with trigrams - Did you mean Elasticsearch	1	687	November 21, 2017
Tokenizer to get combinations of words Elasticsearch	2	863	November 14, 2018
Partial word search does not work with Ngram Analyzer! Elasticsearch	2	1390	October 11, 2017
Query_string on n-gram field Elasticsearch	1	502	December 7, 2016

Conditional N-gram token filter?

Related topics