Search response time doubled erratically

utmishra · September 21, 2016, 12:11pm

Hi,

Our system recently faced a CPU usage spike and the underlying reason is still unknown. We have faced high memory usage and disk alerts in the past, since we run a nightly job of bulk indexing, updating almost all our docs. But high CPU usage has not been a problem.

The data collected so far:

Node 03 (out of 6 data nodes and 3 master) suffered from high CPU usage (> 95%) for 5 minutes, resulting in a response time spike of 1 sec, while the average response time is 40 ms.
Looking through the metrics, there was a slight bump in the indexing count on the given high CPU node, at the same time, there was a slight bump in Young GC (nothing like a spike though, in both cases).

I am not ruling out heavy indexing, since we do have a kafka consumer accepting bulk indexing data any time of data, but that is controlled at a speed of max 250 docs per second with a lag time of 250 ms between each bulk call.

Also, the hot threads endpoint did give some data, although I am not able to decipher it yet.

Link to Hot threads

spinscale · September 21, 2016, 4:38pm

Hey,

so from the hot_threads output it rather looks, as if search was eating some CPU (but by far now all of this, so it might be fine actually), as the threads being mentioned look like [shopo-elasticsearch-prd-sg2-02][search][T#3]

You should also check your GC statistics (part of the node stats) and can also check your log files for long running GCs.

--Alex

utmishra · September 21, 2016, 6:08pm

There has been some development. After the spike, CPU usage decreased gradually and is normal.
However, our response time is consistently staying between 70-250 ms (Usual average - 35-100 ms).
There is a near-to-toothsaw (not exactly a uniform toothsaw ) pattern in the response currently.

As per your question, there was a small bump in old GC count when the spike occurred.

Haven't found any anomaly in the node stats. Will update when found. Still posting for investigation.

node stats

Also posting the recent hot thread.
hot_thread_2

Topic		Replies	Views
CPU for one of the nodes is high frequently Elasticsearch	2	572	September 28, 2017
CPU and Garbage Collection Time Reguraily Spike For Long Periods of Time Elasticsearch	4	978	November 18, 2019
Getting sudden bursts of CPU Elasticsearch	3	1810	May 28, 2020
Dealing with indexing spikes Elasticsearch	4	1168	July 5, 2017
High CPU usage when idle Elasticsearch	6	2300	July 5, 2017

Search response time doubled erratically

Related topics