Need Help: Upgrade of ES + Large queries = new CPU overload

Scott_Decker · September 1, 2014, 2:50pm

Hey all,
We have been testing the new 1.3.1 release on our current load and
queries, and have found that under same conditions, same queries, the es
cluster we have just starts to max out cpu and the thread pools fill up and
the query times just keep going up until eventually we have the restart
nodes just to clear things.
on our older (.20.6) version, we do have big queries. Think 100+ terms, but
they were all wrapped in a filter and cached. We almost never did any
scoring, and if we did, it was only on a few terms.
so, a query may look like the following:

"query": {
"filtered": {
"query": {
"constant_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"should": {
"bool": {
"should": [
{
"term": {
"content": "smyrna"
}
},
{
"term": {
"title": "smyrna"
}
}
]
}
}
}
}
]
}
},
"boost": 1
}
},
"filter": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {<fill in long lists of ids
here}

and this filter is broken up into multiple sections, each filter is

given a cache name and cache key

so, what could have changed between .20.6 and 1.3.1 that would cause this
sort of non-scored filter query to suddenly spend so much cpu time running?
i did a thread dump and it is setting multiple threads in the .scorer state
of the filteredquery.
not sure if that matters.

any help in trying to figure out where the es is spending its time on all
of this would be helpful. we at least have marvel up and running now and
that tells us that cpu gets pegged and the avg query times, but not sure
how to start debugging the query side to see could be changed under the
hoods to cause such a drastic change.

Thanks,
Scott

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be45cf36-3b4a-4452-b3bc-461a879dec02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott_Decker · September 1, 2014, 6:13pm

well, in case anyone wants to know, it was because we had
_cache:true
and
_cache_key:

items in our filter sets.
basically because they are known filters that do not change.

for some reason, having this set caused huge amounts of cpu usage. not sure
what was happening behind the scenes, but this was our culprit. will have
to look into the code and see what causes this to cause such an issue.

On Monday, September 1, 2014 7:50:35 AM UTC-7, Scott Decker wrote:

Hey all,
We have been testing the new 1.3.1 release on our current load and
queries, and have found that under same conditions, same queries, the es
cluster we have just starts to max out cpu and the thread pools fill up and
the query times just keep going up until eventually we have the restart
nodes just to clear things.
on our older (.20.6) version, we do have big queries. Think 100+ terms,
but they were all wrapped in a filter and cached. We almost never did any
scoring, and if we did, it was only on a few terms.
so, a query may look like the following:

"query": {
"filtered": {
"query": {
"constant_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"should": {
"bool": {
"should": [
{
"term": {
"content": "smyrna"
}
},
{
"term": {
"title": "smyrna"
}
}
]
}
}
}
}
]
}
},
"boost": 1
}
},
"filter": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {<fill in long lists of ids
here}
and this filter is broken up into multiple sections, each filter is 
given a cache name and cache key

so, what could have changed between .20.6 and 1.3.1 that would cause this
sort of non-scored filter query to suddenly spend so much cpu time running?
i did a thread dump and it is setting multiple threads in the .scorer
state of the filteredquery.
not sure if that matters.

any help in trying to figure out where the es is spending its time on all
of this would be helpful. we at least have marvel up and running now and
that tells us that cpu gets pegged and the avg query times, but not sure
how to start debugging the query side to see could be changed under the
hoods to cause such a drastic change.

Thanks,
Scott

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19fee079-ce3e-4e77-a4b4-7e95c35f9d98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
High CPU consumption Elasticsearch	8	8632	July 5, 2017
Need help to overcome 100% CPU Elasticsearch	18	14264	May 25, 2017
Elasticsearch - High CPU usage for simple query while load testing Elasticsearch	6	1937	April 19, 2019
ElasticSearch 1.7: query time spike, and the application processor crash after that Elasticsearch	10	755	December 29, 2017
High CPU usage / load average while no running queries Elasticsearch	16	23108	February 5, 2019

Need Help: Upgrade of ES + Large queries = new CPU overload

Related topics