Query string with highlighting slow and high cpu usage

imranazad · March 13, 2018, 6:20pm

I have the following query:

GET /test/_search?human=true
{
"profile": true,
"_source": "title",
"from":0,
"size":250,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"VITAL SIGNS COMPUTER",
"default_operator":"AND",
"analyzer":"standard",
"allow_leading_wildcard":false,
"analyze_wildcard":true,
"fields":[
"content.plain"
]
}
}
]
}
},
"highlight":{
"pre_tags":[
""
],
"post_tags":[
""
],
"order":"score",
"fields":{
"content.plain":{
"fragment_size":150,
"no_match_size":150,
"number_of_fragments": 1
}
}
}
}

It's taking approx 1944 miliseconds for a size of 250 documents, if I remove the highlighter from the query I'm getting 977, that's 967 extra milliseconds for the highlighter which tells me something is causing it to run slow. I'm using the FVH highlighter, here is the output from the hot threads which also seems to indicate it's the highlighter.

gist.github.com

https://gist.github.com/imranazad/40c734d61be2c31a555bd8b4d69d346d

gistfile1.txt

   86.7% (433.3ms out of 500ms) cpu usage by thread 'elasticsearch[xxxxx][search][T#1]'
     4/10 snapshots sharing following 27 elements
       org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:101)
       org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138)
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState$1.fillBuffer(CompressingStoredFieldsReader.java:531)
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState$1.readBytes(CompressingStoredFieldsReader.java:550)
       org.apache.lucene.store.DataInput.readBytes(DataInput.java:87)
       org.apache.lucene.store.DataInput.skipBytes(DataInput.java:350)
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.skipField(CompressingStoredFieldsReader.java:246)
       org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:601)

This file has been truncated. show original

With the CPU usage being high could it simply mean that we need more CPU power?

imranazad · March 14, 2018, 10:01am

I've observed something interesting. Why is it that when I set the default query to OR the speed is a lot quicker! Shouldn't the OR clause make it slower?

jpountz · March 14, 2018, 10:14am

The hot threads suggest time is spent decompressing the document. So I suspect that maybe with AND you are getting larger top hits on average than if you run with OR?

imranazad · March 14, 2018, 10:32am

@jpountz Ah I see, any suggestions or tips on how I can improve the speed?

imranazad · March 14, 2018, 10:32am

Also regarding decompressing, is there more I can read about that?

jpountz · March 14, 2018, 10:36am

How large are your documents? Also do you really need to highlight 250 documents?

imranazad · March 14, 2018, 1:31pm

They are quite large, we index content from PDFs in a lot of cases more than 1MB. It's not absolutely necessary. By default we offer the user 10 results per page but they do have the option of selecting a maximum of 250.

jpountz · March 14, 2018, 1:49pm

I don't think there is much to optimize then, the slowness is mostly due to the fact that documents are quite large.

imranazad · March 14, 2018, 3:17pm

Thanks Adrien, I had suspected that was the case.

system · April 11, 2018, 3:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Highlighting performance issues with stored field and fvh highlighter Elasticsearch	3	282	March 13, 2024
Need help with highlight api performance Elasticsearch	2	384	March 22, 2022
Highlighting is extremely slow on concurrent requests Elasticsearch	1	1205	December 4, 2016
Fast vector highlighter (fvh) making searches slower Elasticsearch	7	1214	December 29, 2021
Optimize elasticsearch query using highlighting Elasticsearch	1	554	October 12, 2018

Query string with highlighting slow and high cpu usage

Related topics