The highlighters are great because they help you figure out why your query matched a document. Unfortunately when they are used in conjuncture with any kind of analyzer that stems the words in the query, it is difficult to figure out WHY the query matches a document. The FVH highlighter and the pre/post tags help a bit, but it is not clear from the docs what the expected behavior of the FVH highlighter is.
I wrote a post a while ago about about the topic but got no discussion going. So I've written a longer deep dive about the subject here.
https://jack-hodkinson.medium.com/reverse-engineering-elasticsearch-highlights-e36ec4164e84
Thanks for the write-up.
One approach for more transparency into what search terms matched might be to use the highlighter that ships with the annotated_text field which is installed as an extra plugin. Although it's designed for use on annotated_text fields it also works with text fields and uses a markdown-like syntax for adding annotations to text e.g. given this doc:
{
"text": "The new hot technology has emerged!"
}
and this query
{
"query": {
"bool": {
"should": [
{"match": {"text": "fabulous"}},
{"match": {"text": "new"}}
],
"must": [
{"match": {"text": "technology"}},
{"wildcard": {"text": "emerg*"}}
]
}
},
"highlight": {
"type": "annotated",
"fields": {
"text": {}
}
}
}
The marked-up result includes which terms matched a section of text:
"highlight": {
"text": [
"The [new](_hit_term=new) hot [technology](_hit_term=technology) has [emerged](_hit_term=text%3Aemerg*)!"
]
}
That could be a nice solution. I've been meaning to try out annotated text too. I'll give it a go. Thanks!