Looks like there is an issue with highlighting of matched terms on snippets boundary.
Steps to reproduce:
- Create index and field mappings
PUT /test
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"title": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "english",
"fields": {
"exactmatch": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "standard"
}
}
}
}
}
}
- Index document
POST test/_doc/
{
"title": "The concerns span an array of emergencies over four administrations, including infectious-disease outbreaks such as Ebola and Zika and extreme weather events."
}
- As an extreme case, search for entire field text
POST /test/_search
{
"query": {
"multi_match": {
"fields": [
"title",
"title.exactmatch^10"
],
"operator": "and",
"query": "The concerns span an array of emergencies over four administrations, including infectious-disease outbreaks such as Ebola and Zika and extreme weather events.",
"type": "best_fields",
"zero_terms_query": "all"
}
},
"highlight": {
"number_of_fragments": 2,
"fragment_size": 70,
"boundary_max_scan": 20,
"boundary_scanner": "word",
"encoder": "html",
"fields": {
"title": {
"matched_fields": [
"title",
"title.exactmatch"
],
"type": "fvh"
}
}
}
}
The term "including" is not highlighted in returned snippets :
"highlight" : {
"title" : [
"<em>The</em> <em>concerns</em> <em>span</em> <em>an</em> <em>array</em> <em>of</em> <em>emergencies</em> <em>over</em> <em>four</em> <em>administrations</em>, including",
"including <em>infectious</em>-<em>disease</em> <em>outbreaks</em> <em>such</em> <em>as</em> <em>Ebola</em> <em>and</em> <em>Zika</em> <em>and</em> <em>extreme</em> weather"
]
Is such inconsistent behavior a limitation of FVH highlighter or a bug?
Elasticsearch version - 7.10.1