Ngram analyzer configure with startswith logic

buka_bidzina · June 11, 2020, 12:23pm

Hello everyone..
Currently i am using ngram analyzer which is analyzing text like that way:
forinstance: Apple iPhone -> app, appl, apple, ppl, pple,ple,iph,ipho,hone and so on.

But I want it to create tokens like this:
Apple Iphone-> app, appl apple, iph, ipho,iphon,iphone

Index Settings

PUT index
    {
      "settings": {
        "index": {
          "max_ngram_diff": 50
        },
        "analysis": {
          "filter": {
            "custom_shingle": {
              "max_shingle_size": "2",
              "min_shingle_size": "2",
              "output_unigrams": true,
              "type": "shingle"
            },
            "my_char_filter": {
              "pattern": " ",
              "type": "pattern_replace",
              "replacement": ""
            }
          },
          "analyzer": {
            "nGram_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "nGram_tokenizer"
            },
            "bigram_analyzer": {
              "filter": [
                "lowercase",
                "custom_shingle",
                "my_char_filter"
              ],
              "tokenizer": "standard"
            }
          },
          "tokenizer": {
          "nGram_tokenizer": {
            "type": "ngram",
            "min_gram": 3,
            "max_gram": 20,
            "token_chars": [
              "letter",
              "digit"
            ]
          }
        }
        }
      }
    }

buka_bidzina · June 11, 2020, 1:13pm

"nGram_tokenizer": {
            "type": "edge_ngram",
            "min_gram": 3,
            "max_gram": 20,
            "token_chars": [
              "letter",
              "digit"
            ]
          }

system · July 9, 2020, 1:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Weird Results with Pattern Analyzer Elasticsearch	1	342	May 24, 2019
Combining ngram tokenizer with stopwords Elasticsearch	1	100	April 12, 2024
Issues creating custom_analyzer Elasticsearch	4	399	September 13, 2019
Tokens outside the ngram size Elasticsearch	2	266	July 6, 2017
Elasticsearch ngram tokenizer Elasticsearch	4	792	February 10, 2020

Ngram analyzer configure with startswith logic

Related topics