Combining ngram tokenizer with stopwords

Helllo everyone

I'm trieing to create custom tokenizer with ngram.

GET /_analyze
 "tokenizer": {
    "token_chars": [
  "filter": [ {
  "text": "HELP TEST VALUE"

Of course it's not filtering the "HELP" out, because ngram tokenizer splitted before filter,
If I use "type":"standard in tokenizer, then ngram filter works, but tokenizer produces result that is not what we want, ngram works only filter on the result.

Can someone help in this case?


1 Like

Hi @Dimitri_Gamkrelidze

According to the documentation, the behavior is correct.
What do you expect as a result? I believe you can create a tokenizer other than Ngram to generate the tokens. Use ngram only in the filter step.