504 gateway errors when using postings highlighter

Hi,

When specifying a large page size of 500 and use the postings highlighter I'm getting 504 gateway errors. When I remove the highlighter from the query the query runs fine. Is it the highlighter that is the main cause of the error or something else?

Here is my query:

GET test/_search
{
"from": 0,
"size": 500,
"highlight":{
"pre_tags":[
""
],
"post_tags":[
""
],
"order":"score",
"fields":{
"content.plain":{
"fragment_size":50,
"no_match_size":50,
"number_of_fragments":0
},
"title.plain":{
"pre_tags":[
""
],
"post_tags":[
"
"
],
"fragment_size":300,
"number_of_fragments":0,
"matched_fields":[
"title.plain"
]
}
}
},
"query": {
"match": {
"content": "ben and lad"
}
}

It's a timeout when Kibana waits for es to respond. What is the size of the content.plain field in your index ? If it's too big, asking for 500 results will be slow and Kibana will not wait the response forever. What happens if you try with a smaller size (100, 50) ? Can you also share the mapping of the content.plain field ?

@jimczi Thanks for the reply. It works fine with a size of 10 however it even happens with a size of 50. Yes the content fields are quite big, we index pdf documents and all their contents go into the contents field so some fields could be many megabytes in size.

Yes of course here is the mapping for content.plain:

                    "plain": {
                        "type": "text",
                        "analyzer": "standard",
                        "search_analyzer": "disable_highlighting_on_stopwords_merged_hyphens",
                        "search_quote_analyzer": "standard",
                        "store": "yes",
						"term_vector" : "with_positions_offsets"
                    }

here is the search analyzer:

                "disable_highlighting_on_stopwords_merged_hyphens": {
                    "char_filter": [
                        "remove_hyphens"
                    ],
                    "filter": [
                        "lowercase",
                        "english_stopwords"
                    ],
                    "tokenizer": "standard"
                },

Also would having 5 shards help? We currently only have 3 shards.

With "term_vector" : "with_positions_offsets" the fvh highlighter is used, not the postings one. You can try to reindex with "index_options":"offsets" to use the postings highlighter but I am not sure that this will be faster.
We fixed an issue in 5.6 that caused the highlighting to be run twice, you should also try to upgrade to this version to see if it speed up the query. Finally we introduced a new highlighter called unified, it can highlight from different sources (term vectors, postings, ...) and could also speed up a bit the highlighting of your query.
Regarding big documents, few MBs per documents seems too big, especially if you want to highlight a lot of documents in the same request. You should try with the "index_options":"offsets" but in any case highlighting big documents will inevitably slow down your requests.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.