Hi!
We recently improved our hit highlighting by utilizing fvh
(Lucene Fast Vector Highlighter). The highlighting is done as follows:
{
"query": {...},
"highlight": {
"fields": {
"searchableName": {
"number_of_fragments": 0,
"type": "fvh",
"matched_fields": [
"searchableName",
"searchableName.exact",
"searchableName.replace_vowels",
"searchableName.smart",
"searchableName.smart.reverse",
"searchableName.standard_clean",
"searchableName.standard_clean_reverse"
]
}
}
}
}
This enables us to combine highlights from multiple fields using different analyzers. Each of the above-mentions fields use with_posititions_offsets
term vectors (as instructed in Highlighting | Elasticsearch Guide [7.15] | Elastic).
However, the problem is, that this seems to have a big impact on performance. We turned off the advanced highlighting on during 1pm-3pm (i.e. hourly time buckets of 13:00
& 14:00
). The percentiles show search durations (Elasticsearch took
in milliseconds):
time_bucket | search_count | percentile_5 | percentile_25 | median | percentile_75 | percentile_95
------------------------------+--------------+--------------+---------------+--------+---------------+---------------
2021-11-30 11:00:00+00 | 1368 | 19 | 120 | 941 | 1168 | 2176
2021-11-30 12:00:00+00 | 1134 | 20 | 109 | 1010 | 1189 | 2647
2021-11-30 13:00:00+00 (off) | 1384 | 13 | 32 | 249 | 306 | 1092
2021-11-30 14:00:00+00 (off) | 1290 | 13 | 40 | 252 | 304 | 1086
2021-11-30 15:00:00+00 | 1912 | 16 | 242 | 1105 | 1374 | 6278
2021-11-30 16:00:00+00 | 1333 | 16 | 123 | 1099 | 1284 | 5315
We can clearly see the searches became about three times faster (see 50/75/95 percentiles). Why is that? Each search shows only top 50 hits, so how can the highlighting be so slow? Also, we noticed that the highlighting slows queries even if one particular search would not utilize the searchableName
fields at all.
We are running Elasticsearch 7.12.1
in a cluster of four 64GB physical servers.