Ngram token filter omits tokens less than min_gram number


(Krishna Kumar Perumalla) #1

Hi,

We are using the following settings

curl -XPUT 'localhost:9200/documents5' -d '
{
"settings" : {
"analysis" : {
"filter" : {
"pp_ngram":{
"type":"nGram", "min_gram": 3, "max_gram": 50
}
},
"analyzer" : {
"name_index_analyzer" : {
"type": "custom",
"tokenizer" : "standard",
"filter" : ["word_delimiter", "lowercase", "pp_ngram"]
}
}
}
}
}'

curl
'localhost:9200/documents5/_analyze?pretty=1&analyzer=name_search_analyzer'
-d 'API RP 65 Part2.pdf' | grep -w token
"token" : "api",
"token" : "par",
"token" : "part",
"token" : "art",
"token" : "pdf",

We are missing "RP" and "65" tokens. Shouldn't it simple add tokens which
have less than min_gram size?

Is it possible to accomplish this with any other possible configuration /
setting.

/Krishna

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0bdf64c-ce47-4b1d-bd45-8754288eadcb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2