Getting sudden bursts of CPU

fgourichon · April 28, 2020, 2:05pm

Hi there,

We just started using Elasticsearch Services and we're facing a few issues we could get help with.

Our setup and use cases are quite simple:

One index containing 14M documents (orders), used for plain text search on our site (across ~20 fields )
One index containing 2M documents used for auto complete suggestions
Both indexes are split in 5 shards, with one replica

Most of the traffic is on the first index with around 2 requests/seconds in peak (requests made from our app to the cluster, not per shard).
A bit less than 1 per second on the second one.
Not a crazy traffic, quite regular across the day.

We tried 2 configurations, 3 nodes with 8GB mem and 2 nodes with 15GB.
Under load, we don't seem to have any memory issue, and CPU usage is below 20%

We regularly experienced one of the node's CPU going up to 100% (in both configs), and then staying stuck for several minutes. We sometimes had to restart the cluster to unlock things.

During those episodes, we could observe that the search queue of this node is full
The spike of CPU if very sudden, it goes from 20% to 100% in a couple of seconds
Kibana doesn't report any search/index any metrics during those episodes
We tried reproducing this behaviour on a perf cluster with the same data without any success. When we try to reproduce what we think is the traffic on Prod and we increase rps, response times are degrading progressively, but don't 'break' the cluster like that

We are unsure about the nature of the traffic during those episodes. We see some traces of spikes of search/seconds after the burst of CPU, but it's hard to tell if it's a real cause or some glitches in Kibana reporting as some data points seems to be missing.

We must be missing something obvious, but can't see what it is...

How can we tell what's causing those bursts of CPUs? I know ES can be setup to provide slow queries logs, but on ElasticSearch Services I can't figure out how to get those logs.

We're upgrading to 3*15Gb nodes in the meantime, but it's frustrating not to be able to get to the bottom of the problem.

We're using ES 6.8.8, and for context our app is running on Rails, using the searchkick gem.

Any help greatly appreciated!

DavidTurner · April 28, 2020, 2:21pm

It may help to look at the hot threads API when the CPU spikes. This will show you which threads are busy and what they're up to. If you need help interpreting what you see, share the output here.

fgourichon · April 30, 2020, 7:37am

Thanks David,
After bumping up the cluster to 3 15GB nodes we did not see any new issue "unfortunately" no additional investigation. I'll try to get a better understanding of our traffic to reproduce on a separated cluster

system · May 28, 2020, 7:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What's with these crazy CPU spikes? Elasticsearch	2	1756	October 23, 2018
Spike in cluster CPU when viewing Discover page on Kibana Elasticsearch	1	336	August 14, 2018
Elasticsearch cpu spike, search thread pool queues explode Elasticsearch	10	2118	December 12, 2018
One ES Data node's CPU jumps to 90%+ suddenly while in production Elasticsearch	7	966	May 6, 2021
Sudden 100% CPU spike on a data node with Kibana becoming unresponsive Elasticsearch	2	1101	December 11, 2017

Getting sudden bursts of CPU

Related topics