Fast vector highlighter (fvh) making searches slower

Hi!

We recently improved our hit highlighting by utilizing fvh (Lucene Fast Vector Highlighter). The highlighting is done as follows:

{
  "query": {...},
  "highlight": {
    "fields": {
      "searchableName": {
        "number_of_fragments": 0,
        "type": "fvh",
        "matched_fields": [
          "searchableName",
          "searchableName.exact",
          "searchableName.replace_vowels",
          "searchableName.smart",
          "searchableName.smart.reverse",
          "searchableName.standard_clean",
          "searchableName.standard_clean_reverse"
        ]
      }
    }
  }
}

This enables us to combine highlights from multiple fields using different analyzers. Each of the above-mentions fields use with_posititions_offsets term vectors (as instructed in Highlighting | Elasticsearch Guide [7.15] | Elastic).

However, the problem is, that this seems to have a big impact on performance. We turned off the advanced highlighting on during 1pm-3pm (i.e. hourly time buckets of 13:00 & 14:00). The percentiles show search durations (Elasticsearch took in milliseconds):

         time_bucket          | search_count | percentile_5 | percentile_25 | median | percentile_75 | percentile_95
------------------------------+--------------+--------------+---------------+--------+---------------+---------------
 2021-11-30 11:00:00+00       |         1368 |           19 |           120 |    941 |          1168 |          2176
 2021-11-30 12:00:00+00       |         1134 |           20 |           109 |   1010 |          1189 |          2647
 2021-11-30 13:00:00+00 (off) |         1384 |           13 |            32 |    249 |           306 |          1092
 2021-11-30 14:00:00+00 (off) |         1290 |           13 |            40 |    252 |           304 |          1086
 2021-11-30 15:00:00+00       |         1912 |           16 |           242 |   1105 |          1374 |          6278
 2021-11-30 16:00:00+00       |         1333 |           16 |           123 |   1099 |          1284 |          5315

We can clearly see the searches became about three times faster (see 50/75/95 percentiles). Why is that? Each search shows only top 50 hits, so how can the highlighting be so slow? Also, we noticed that the highlighting slows queries even if one particular search would not utilize the searchableName fields at all.

We are running Elasticsearch 7.12.1 in a cluster of four 64GB physical servers.

The numbers for fvh are lower so that would mean it’s faster not slower?

Sorry, I phrased the question badly (fixed now). So we turned off the FVH to confirm it was the highlighting that's causing the slowness.

1 Like

So the diff we are seeing is between FVH and no highlighting?
We can obviously expect performance differences there so a better test might be FVH versus another choice of highlighter.

Despite its name, FVH is not always faster. IIRC it relies on pre-computed disk stores of offsets to avoid re-tokenizing document strings. This might help avoid heavy costs of analyzing lengthy texts at query time but the trade-off doesn't always work in your favour. Maybe the costs of retrieving the pre-computed offsets from disk are higher than just re-analyzing short strings for highlighting. We don't know because your benchmarking was FVH vs no highlighting.

I'd suggest trying a different choice of highlighter implementation and benchmarking FVH versus that for a more realistic comparison.

1 Like

It's hard to do meaningful comparison because fvh is the only highlighter capable of combining highlights from multiple fields... :thinking:

However, I can compare e.g. the following two highlight patterns:

{
  "size": 50,
  "query": { ... },
  // Default highlighter: ~ 1.8s response times
  "highlight": {
    "require_field_match": false,
    "fields": {
      "searchableName.*": {
        "number_of_fragments": 0
      }
    }
  },
  // FVH highlighter: ~ 2.5s response times
  "highlight": {
    "fields": {
      "searchableName": {
        "number_of_fragments": 0,
        "type": "fvh",
        "matched_fields": [
          "searchableName",
          "searchableName.exact",
          "searchableName.replace_vowels",
          "searchableName.smart",
          "searchableName.smart.reverse",
          "searchableName.standard_clean",
          "searchableName.standard_clean_reverse"
        ]
      }
    }
  }
}

Where the default highlighter (although unable to combine highlights) gives about 40% faster response times (roughly 1.8s vs 2.5s).

This is an old highlighter benchmarking thread but may still be of interest.

1 Like

We are also working on small fields. Typically few words in each.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.