After some study on the Lucene highlight that elastic search based on. I have an unanswered question.
I want to use the standard tokenizer on the index/search stages (to utilize properly the token filters) and also to use the edge n-gram tokenizer to highlight (need to highlight the exact searched terms).
Without using the multi-field mapping option, there is an option to force somehow the highlighter to use a different analyzer/tokenizer for is reanalyzing stage only?
And if not, there is a way to write a custom tokenizer as a plugin that consists of a custom pipeline (tokenizer and collection of token filters)?
Example to clarify the desired outcome:
index/search analyzer:
"index_search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"word_delimiter"
]
}
highlight analyzer:
"highlight_analyzer": {
"type": "custom",
"tokenizer": "edgeNGram" (min:3, max:8),
"filter": [
"lowercase",
"word_delimiter_graph"
]
}
indexed: "He123"
search: "123"
highlight: "He123
"