Highlighting performance issues with stored field and fvh highlighter

Hello,

I'm having a very similar issue to Elastic query takes over 1 minute due to time spent in "HighlightPhase"

I have documents with an optional attachments text field that for some documents can be quite big (up to 20 MB).

I have a query that searches for a match in the title field and highlights this field. If the query matches only smaller documents, the whole query takes 30ms maximum (measured in the Kibana Dev Tools). If I match many larger documents, the same query takes over 1s.

The highlighted field is stored.

The simplified query:

{
  "profile": true, 
  "from": 0,
  "size": 20,
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "_id": [
              /* 20 ids of documents with either large or small attachments
            ]
          }
        }
      ],
      "should": [
        {
          "simple_query_string": {
            "query": "cat",
            "fields": [
              "title"
            ]
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {
        "type": "fvh"
      }
    }
  },
  "_source": false
}

The mapping of the title field:

"title": {
  "type": "text",
  "store": true,
  "fields": {
    "direct": {
      "type": "text",
      "term_vector": "with_positions_offsets",
      "analyzer": "german_simple"
    },
    "exact": {
      "type": "text",
      "term_vector": "with_positions_offsets",
      "analyzer": "german_exact"
    },
    "suggest": {
      "type": "text",
      "term_vector": "with_positions_offsets",
      "analyzer": "german_trigram_search_suggests"
    }
  },
  "term_vector": "with_positions_offsets",
  "analyzer": "german_decompound"
},

The additional time is completely spent in the "HighlightPhase":

"Fast" documents:

{
  "type": "HighlightPhase",
  "description": "",
  "time_in_nanos": 10123678,
  "breakdown": {
    "process_count": 10,
    "process": 10119747,
    "next_reader": 3931,
    "next_reader_count": 5
  }
},
...
{
  "type": "HighlightPhase",
  "description": "",
  "time_in_nanos": 10709348,
  "breakdown": {
    "process_count": 10,
    "process": 10705987,
    "next_reader": 3361,
    "next_reader_count": 4
  }
},

"Slow" documents:

{
  "type": "HighlightPhase",
  "description": "",
  "time_in_nanos": 900147051,
  "breakdown": {
    "process_count": 11,
    "process": 900142684,
    "next_reader": 4367,
    "next_reader_count": 5
  }
},
...
{
  "type": "HighlightPhase",
  "description": "",
  "time_in_nanos": 657357947,
  "breakdown": {
    "process_count": 9,
    "process": 657355378,
    "next_reader": 2569,
    "next_reader_count": 3
  }
},

Any pointers and ideas are much appreciated. Thanks for your attention.

Sorry, I forgot. I'm on Elasticsearch version 8.12.1. I was on 8.10.4 when I noticed the problem and an update to the latest available version did not change this behavior.

There seems to be some issue with the highlighting as you can see in this similar post.

There are some github issues linked, mainly this one:

Your issue may be related to this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.