Combining ngram tokenizer with stopwords

Helllo everyone

I'm trieing to create custom tokenizer with ngram.

GET /_analyze
{
 "tokenizer": {
    "type":"ngram",
    "min_gram":3,
    "max_gram":3,
    "token_chars": [
    "letter",
    "digit",
    "punctuation",
    "symbol",
    "whitespace"
  ]
  },
  "filter": [ {
    "type":"stop",
    "ignore_case":true,
    "stopwords":[
      "HELP"
      ]
  }
  ,{
    "type":"ngram",
    "min_gram":3,
    "max_gram":3
  }
  ],
  "text": "HELP TEST VALUE"
}

Of course it's not filtering the "HELP" out, because ngram tokenizer splitted before filter,
If I use "type":"standard in tokenizer, then ngram filter works, but tokenizer produces result that is not what we want, ngram works only filter on the result.

Can someone help in this case?

Thanks

1 Like

Hi @Dimitri_Gamkrelidze

According to the documentation, the behavior is correct.
What do you expect as a result? I believe you can create a tokenizer other than Ngram to generate the tokens. Use ngram only in the filter step.