Hello,
I'm having a very similar issue to Elastic query takes over 1 minute due to time spent in "HighlightPhase"
I have documents with an optional attachments text field that for some documents can be quite big (up to 20 MB).
I have a query that searches for a match in the title field and highlights this field. If the query matches only smaller documents, the whole query takes 30ms maximum (measured in the Kibana Dev Tools). If I match many larger documents, the same query takes over 1s.
The highlighted field is stored.
The simplified query:
{
"profile": true,
"from": 0,
"size": 20,
"query": {
"bool": {
"filter": [
{
"terms": {
"_id": [
/* 20 ids of documents with either large or small attachments
]
}
}
],
"should": [
{
"simple_query_string": {
"query": "cat",
"fields": [
"title"
]
}
}
]
}
},
"highlight": {
"fields": {
"title": {
"type": "fvh"
}
}
},
"_source": false
}
The mapping of the title field:
"title": {
"type": "text",
"store": true,
"fields": {
"direct": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "german_simple"
},
"exact": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "german_exact"
},
"suggest": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "german_trigram_search_suggests"
}
},
"term_vector": "with_positions_offsets",
"analyzer": "german_decompound"
},
The additional time is completely spent in the "HighlightPhase":
"Fast" documents:
{
"type": "HighlightPhase",
"description": "",
"time_in_nanos": 10123678,
"breakdown": {
"process_count": 10,
"process": 10119747,
"next_reader": 3931,
"next_reader_count": 5
}
},
...
{
"type": "HighlightPhase",
"description": "",
"time_in_nanos": 10709348,
"breakdown": {
"process_count": 10,
"process": 10705987,
"next_reader": 3361,
"next_reader_count": 4
}
},
"Slow" documents:
{
"type": "HighlightPhase",
"description": "",
"time_in_nanos": 900147051,
"breakdown": {
"process_count": 11,
"process": 900142684,
"next_reader": 4367,
"next_reader_count": 5
}
},
...
{
"type": "HighlightPhase",
"description": "",
"time_in_nanos": 657357947,
"breakdown": {
"process_count": 9,
"process": 657355378,
"next_reader": 2569,
"next_reader_count": 3
}
},
Any pointers and ideas are much appreciated. Thanks for your attention.