CPUs at 100%, but no disk I/O - ES 7.3

Aharon · August 22, 2019, 9:05pm

ES Version 7.3
Various node sizes (hot/warm) (25+ nodes)
SSDs on all nodes
JDK 11
X-Pack Document level security (thank you for fixing the bitset issue in 7.3)

Periodically our warm nodes that are sitting at 0% CPU utilization will spike to 100% for several minutes. After several minutes, they will drop back down to 0%. During these spikes, iostat shows 0 disk I/O. In fact, Disk I/O on our warm nodes is very very low. The operating system confirms the CPU spike is from ES (top). The lack of disk I/O is concerning as it makes me feel like I cant shard my way out of the spike.

Here is a link to a snipped of our hot_threads:

Any suggestions?

rugenl · August 22, 2019, 11:07pm

Any activity in the gc logs?

Aharon · August 22, 2019, 11:21pm

gc.logs are normal (we run them through an analyzer as well just to be sure). Boxes have a lot of free heap. Thank you so much for responding!!

We did notice this issue increase after enabling document level security with our own realm...

Thanks,
Aharon

spinscale · August 23, 2019, 7:14am

if that CPU spike happens again, use the hot threads API to figure out in which part of the Elasticsearch code time is spent.

Bertrand · August 23, 2019, 8:35am

We noticed the same behaviour on our cluster as well. This was caused by queries generated by Kibana's KQL Value Suggestions feature while user is writing his query. We have several aliases targeting >1K indexes. The value suggestion mechanism is sending queries against all indices to discover the top terms but without considering the selected time frame. This results in a huge load on the cluster...
Our temporary solution was to disable KQL value suggestions in Kibana until we further investigate the issue.
Maybe you are facing the same issue...

Aharon · August 23, 2019, 10:48am

spinscale, I may have already posted a hot-threads output in my first post (unless I used the API wrong)

Bertrand, thank you so much for your suggestion! I logged into our production environment, ran iostat on a warm node while I typed a field_name: in the discover search bar... The search time frame was the default of 15mins. All of a sudden all my warm nodes went to 25% CPU with no disk I/O. My warm nodes have data from >7 days ago... So you are correct, the Kibana value suggestions is not time aware. We will be disabling it.

Elastic team - I think that the Kibana value suggestions play a role, but not the entire role as I was never able to get it consume more than 25% of the processors on our warm nodes. Is there anything else similar that I can disable that may not be time aware and inadvertently query our warm nodes?

spinscale · August 23, 2019, 5:40pm

I must have missed the hot threads output, sorry for that.

Regarding speeding up the suggestions, you may find https://github.com/elastic/kibana/pull/37643 interesting.

system · September 20, 2019, 5:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting sudden bursts of CPU Elasticsearch	3	1805	May 28, 2020
100% cpu system time used on hdd data node Elasticsearch	3	535	April 19, 2021
High CPU Utilisation in 8.11.4 Elasticsearch	7	134	April 9, 2024
What's with these crazy CPU spikes? Elasticsearch	2	1758	October 23, 2018
Newbie performance troubleshooting, high load spikes on ES nodes Elasticsearch	5	5058	June 11, 2018

CPUs at 100%, but no disk I/O - ES 7.3

Related topics