Query string with highlighting slow and high cpu usage

I have the following query:

GET /test/_search?human=true
{
"profile": true,
"_source": "title",
"from":0,
"size":250,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"VITAL SIGNS COMPUTER",
"default_operator":"AND",
"analyzer":"standard",
"allow_leading_wildcard":false,
"analyze_wildcard":true,
"fields":[
"content.plain"
]
}
}
]
}
},
"highlight":{
"pre_tags":[
""
],
"post_tags":[
""
],
"order":"score",
"fields":{
"content.plain":{
"fragment_size":150,
"no_match_size":150,
"number_of_fragments": 1
}
}
}
}

It's taking approx 1944 miliseconds for a size of 250 documents, if I remove the highlighter from the query I'm getting 977, that's 967 extra milliseconds for the highlighter which tells me something is causing it to run slow. I'm using the FVH highlighter, here is the output from the hot threads which also seems to indicate it's the highlighter.

With the CPU usage being high could it simply mean that we need more CPU power?

I've observed something interesting. Why is it that when I set the default query to OR the speed is a lot quicker! Shouldn't the OR clause make it slower?

The hot threads suggest time is spent decompressing the document. So I suspect that maybe with AND you are getting larger top hits on average than if you run with OR?

@jpountz Ah I see, any suggestions or tips on how I can improve the speed?

Also regarding decompressing, is there more I can read about that?

How large are your documents? Also do you really need to highlight 250 documents?

They are quite large, we index content from PDFs in a lot of cases more than 1MB. It's not absolutely necessary. By default we offer the user 10 results per page but they do have the option of selecting a maximum of 250.

I don't think there is much to optimize then, the slowness is mostly due to the fact that documents are quite large.

Thanks Adrien, I had suspected that was the case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.