504 gateway errors when using postings highlighter

imranazad · November 22, 2017, 5:33pm

Hi,

When specifying a large page size of 500 and use the postings highlighter I'm getting 504 gateway errors. When I remove the highlighter from the query the query runs fine. Is it the highlighter that is the main cause of the error or something else?

Here is my query:

GET test/_search
{
"from": 0,
"size": 500,
"highlight":{
"pre_tags":[
""
],
"post_tags":[
""
],
"order":"score",
"fields":{
"content.plain":{
"fragment_size":50,
"no_match_size":50,
"number_of_fragments":0
},
"title.plain":{
"pre_tags":[
""
],
"post_tags":[
""
],
"fragment_size":300,
"number_of_fragments":0,
"matched_fields":[
"title.plain"
]
}
}
},
"query": {
"match": {
"content": "ben and lad"
}
}

jimczi · November 24, 2017, 8:09am

It's a timeout when Kibana waits for es to respond. What is the size of the content.plain field in your index ? If it's too big, asking for 500 results will be slow and Kibana will not wait the response forever. What happens if you try with a smaller size (100, 50) ? Can you also share the mapping of the content.plain field ?

imranazad · November 24, 2017, 8:36am

@jimczi Thanks for the reply. It works fine with a size of 10 however it even happens with a size of 50. Yes the content fields are quite big, we index pdf documents and all their contents go into the contents field so some fields could be many megabytes in size.

Yes of course here is the mapping for content.plain:

                    "plain": {
                        "type": "text",
                        "analyzer": "standard",
                        "search_analyzer": "disable_highlighting_on_stopwords_merged_hyphens",
                        "search_quote_analyzer": "standard",
                        "store": "yes",
						"term_vector" : "with_positions_offsets"
                    }

here is the search analyzer:

                "disable_highlighting_on_stopwords_merged_hyphens": {
                    "char_filter": [
                        "remove_hyphens"
                    ],
                    "filter": [
                        "lowercase",
                        "english_stopwords"
                    ],
                    "tokenizer": "standard"
                },

Also would having 5 shards help? We currently only have 3 shards.

jimczi · November 24, 2017, 9:01am

With "term_vector" : "with_positions_offsets" the fvh highlighter is used, not the postings one. You can try to reindex with "index_options":"offsets" to use the postings highlighter but I am not sure that this will be faster.
We fixed an issue in 5.6 that caused the highlighting to be run twice, you should also try to upgrade to this version to see if it speed up the query. Finally we introduced a new highlighter called unified, it can highlight from different sources (term vectors, postings, ...) and could also speed up a bit the highlighting of your query.
Regarding big documents, few MBs per documents seems too big, especially if you want to highlight a lot of documents in the same request. You should try with the "index_options":"offsets" but in any case highlighting big documents will inevitably slow down your requests.

system · December 22, 2017, 9:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How highlighting actually works? Elasticsearch	3	477	July 6, 2017
Very excessive highlight fragments Elasticsearch	4	315	July 6, 2017
Highlight - Fragment size not working as expected Elasticsearch	2	532	November 6, 2019
Elastic query takes over 1 minute due to time spent in "HighlightPhase" Elasticsearch	6	555	November 1, 2023
Issue with queries using highlighting and fields * - hundreds of times slower Kibana	4	1523	September 16, 2018

504 gateway errors when using postings highlighter

Related topics