Hi,
For my use case I need to use both an edge-ngram token filter and a synonym filter, and then highlight the appropriate token in the result using highlight.
Considering I've to use both edge-ngram and synonym, I've to use edge-ngram token filter as against the edge ngram tokenizer; and apply them in the order of synonym filter -> edge ngram token filter.
However this creates a issue in the highlighting (which as per me, comes because of no position increments).
Please look at the below image to see the highlight created when I search for "index". Even "industrial" (as a whole gets highlighted)
PUT test1/
{
"settings": {
"analysis": {
"filter": {
"my_edge_ngram_filter": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "1",
"type": "edgeNGram",
"max_gram": "12"
},
"synonym_normal": {
"type": "synonym",
"synonyms": [
"index, bond",
"industrial, industry"
]
}
},
"tokenizer": {
"my_edge_ngram_tokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "1",
"type": "edgeNGram",
"max_gram": "12"
}
},
"analyzer": {
"synonym_edgengram": {
"filter": [
"synonym_normal",
"my_edge_ngram_filter"
],
"tokenizer": "whitespace",
"type" : "custom"
},
"edgengram_tokenizer": {
"tokenizer": "my_edge_ngram_tokenizer",
"type" : "custom"
}
}
}
},
"mappings": {
"test1": {
"properties": {
"name": {
"type": "text",
"fields": {
"field_synonym_edgengram": {
"type": "text",
"analyzer": "synonym_edgengram",
"fielddata": true
},
"field_edgengram_tokenizer": {
"type": "text",
"analyzer": "edgengram_tokenizer",
"fielddata": true
}
}
}
}
}
}
}
Indexing one document:
POST test1/test1/
{
"name" :"index industrial"
}
Query1 with highlight:
GET test1/_search
{
"query": {
"match": {
"name.field_synonym_edgengram": "index "
}
},
"highlight": {
"fields": {
"name.field_synonym_edgengram": {}
}
}
}
Result:
Where as you can see; even "industrial" as a whole instead of just "ind" gets highlighted
Now, if I run the same query using edge-ngram tokenizer (and without syonyms):
Query2:
GET test1/_search
{
"query": {
"match": {
"name.field_edgengram_tokenizer": "index "
}
},
"highlight": {
"fields": {
"name.field_edgengram_tokenizer": {}
}
}
}
It gets properly highlighted:
I believe this is due to the position increments seen in edge-ngram tokenizer as against filter. Any way to get around this highlighting issue?