I have my custom analyzer as below. But I dont understand how to achieve my goal.
My goal is that I want to have whitespace separated inverted index but also I want to have autocomplete feature after user enters min 3 chars. For that I though to combine word_delimiter and edgeNGram tokens as below
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"standard",
"lowercase",
"my_word_delimiter",
"my_edge_ngram_analyzer"
],
"type": "custom"
}
},
"filter": {
"my_word_delimiter": {
"catenate_all": true,
"type": "word_delimiter"
},
"my_edge_ngram_analyzer": {
"min_gram": 3,
"max_gram": 10,
"type": "edgeNGram"
}
}
}
}
}
}
This will give result for "Brother TN-200" as below. But I was expecting "tn" to be also in the reverted index as I have word_delimiter token. why is it not in the inverted index? How can I achieve this?
url -XGET "localhost:9200/myIndex/_analyze?analyzer=my_analyzer&pr
etty=true" -d "Brother TN-200"
{
{
"token" : "bro",
"start_offset" : 14,
"end_offset" : 21,
"type" : "word",
"position" : 2
}, {
"token" : "brot",
"start_offset" : 14,
"end_offset" : 21,
"type" : "word",
"position" : 2
}, {
"token" : "broth",
"start_offset" : 14,
"end_offset" : 21,
"type" : "word",
"position" : 2
}, {
"token" : "brothe",
"start_offset" : 14,
"end_offset" : 21,
"type" : "word",
"position" : 2
}, {
"token" : "brother",
"start_offset" : 14,
"end_offset" : 21,
"type" : "word",
"position" : 2
}, {
"token" : "tn2",
"start_offset" : 22,
"end_offset" : 28,
"type" : "word",
"position" : 3
}, {
"token" : "tn20",
"start_offset" : 22,
"end_offset" : 28,
"type" : "word",
"position" : 3
}, {
"token" : "tn200",
"start_offset" : 22,
"end_offset" : 28,
"type" : "word",
"position" : 3
}, {
"token" : "200",
"start_offset" : 25,
"end_offset" : 28,
"type" : "word",
"position" : 4
}]
}