Unified Highlighter is too slow


(Aslamy) #1

Hi !

When I use Unified Highlighter it takes about 20-22 sec to retrieve the result but when I change it to fvh highlighter it takes about 2-3 sec.
Why does elasticsearch recommend unified highlighter and has it as default highlighter . Performence is too bad.


(Jimferenczi) #2

Can you provide a small recreation with your mapping, the query that is slow with the unified highlighter and a sample document ? The unified highlighter uses the term vectors when they are activated on the field so it is not expected to be 10x slower than the fvh.


(Aslamy) #3

Hi !
I'm not allowed to share the company's code. But everything are exact the same in my test. Fields are indexed with "term_vector": "with_positions_offsets".

The only thing I change is "type": "unified" to "type": "fvh" .
When I search after one or two tokens both almost has same performance but when tokens increases,
the more tokens the slower it will be.

5 tokens takes about 20-22 sec on unified highlighter but 2-3 sec on fvt highlighter


(Jimferenczi) #4

Can you at least share the query that you used ? How many fields and documents are highlighted and what is the average size of the documents ?


(Aslamy) #5

Can you at least share the query that you used ? How many fields and documents are highlighted and what is the average size of the documents ?

Index has 15683 documents and 1.5 GB big.
At index time we do copy 5 fields into one filed called "content".

{
    	"from": 0,
    	"size": 10,
    	"sort": [],
    	"highlight": {
    		"pre_tags": [
    			"<strong>"
    		],
    		"post_tags": [
    			"</strong>"
    		],
    		"fields": {
    			"document.title": {
    				"no_match_size": 512,
    				"number_of_fragments": 0,
    				"type": "unified"
    			},
    			"content": {
    				"fragment_size": 130,
    				"no_match_size": 256,
    				"number_of_fragments": 2,
    				"type": "unified"
    			}
    		}
    	},
    	"query": {
    		"bool": {
    			"must": [{
    				"multi_match": {
    					"query": "word",
    					"operator": "and",
    					"fields": [
    						"document.title^5",
    						"content"
    					]
    				}
    			}]
    		}
    	}
    }

(Aslamy) #6

@jimczi very important discovery:
When I remove "operator": "and" from multi_match query the response time instead of 20-22 sec is 819 millisec :scream_cat:


(Jimferenczi) #7

The highlighting works only on the top documents (in your example the top 10 documents since you set size to 10) so changing the operator should not impact this phase. However the document that will be returned in the top 10 documents are going to be different so I suspect that you have a very big document (or several) that makes the highlighting slower when you use the query with the and operator. Can you check the size of the document in both cases ?


(Aslamy) #8

@jimczi I think you are right.
When "operator": "and" is set, the size of finded documents are 80mb and when it not set the size is 2mb.
Do you have any suggestion to solve this problem?

fvh highlighter has better Performance on 80mb data than Unified Highlighter


(Jimferenczi) #9

We didn't test this extreme case so I'll need to investigate a bit. I'll do some test on my side and will come back here in a bit. Thanks for reporting.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.