I am implementing completion suggester on multiple field having alphanumeric and colon(AA:890090:xyz:9090).
When I use default analyzer for completion suggester(simple analyzer), I am not able to get suggestions(AA:890) as simple analyzer tokenize alphabets only.
To overcome this issue, we used edge ngram analyzer, which solves the suggestion issues but index size is more than 15 to 17 times as compared to default analyzer.
Foreg. index size is around 3GB with default analyzer and with ngram analyzer size become 50 GB.
Why index size with ngram analyzer is that much high as compare to default analyzer, Elastic search mapping can be find below.
Please suggest, if there is any better way to do the same.
Sample data:
AA:890090:xyz:9090
Sample Mapping :
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 40
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"field1Suggest": {
"type": "completion",
"analyzer": "ngram_analyzer",
"search_analyzer": "whitespace"
},
"field2Suggest": {
"type": "completion",
"analyzer": "ngram_analyzer",
"search_analyzer": "whitespace"
}
}
}
}
}