Consider below example
GET _analyze
{
"tokenizer": {
"pattern" : ",",
"type" : "pattern"
},
"filter": [
{ "type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
},
{
"type" : "stop",
"stopwords" : [ "ab" ]
}
],
"text": "ab bcd,cde"
}
If I am using above configuration then I should expect following tokens:
["bc", "bcd", "cd", "cde"]
but I am getting below list of tokens:
["ab", "ab b", "ab bc", "ab bcd", "cd", "cde"]
What is wrong with configuration of edge_ngram?
How do I achieve first sets of token in case of edge_ngram and custom tokenizer?