Elasticsearch process causes CPU soft lockup (causing the server to hung)

farin99 · May 17, 2018, 2:26pm

We experience hangs on multiple machines that are running elastic data node only.
When it accrued you see a soft lockup symptom of a task or kernel thread using and not releasing a CPU for a longer period of time than allowed. You can see from the logs that it is from the java process (elastic is the only java process running on that machine)

May 14 05:24:21 localhost kernel: [6006808.160001] watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [java:5783]

This has already occurred in 5 different machines

A full description is in https://github.com/elastic/elasticsearch/issues/30667

Thanks!

farin99 · May 20, 2018, 9:25pm

hi @jasontedor, as you closed the issue on git, I was wondering if you could elaborate more here on why you are sure this isn't an issue in elastic?
The reason I'm asking is that we are currently in an ongoing investigation with Azure which already opened a ticket to canonical about this issue. When Azure discuss this issue with canonical they were insisting that the issue is with the java process (in this case elastic) that is not releasing the CPU.
If you could provide us any information or guide us to find proof that the issue is not an elastic issue it will help expedite the process of finding the root cause.

Thanks for your help!
Yoni.

jasontedor · May 21, 2018, 2:02am

Please read through the thread that I linked to from GitHub. It looks identical to the problem that you're experiencing, is on the same kernel, and references several LKML threads on similar issues. This screams kernel issue, not Java issue to me, and is almost surely not an Elasticsearch issue. I am open and willing to be proven wrong, but it will require evidence.

farin99 · May 21, 2018, 8:52am

@jasontedor thanks for the quick response. It appears you are right and it is a kernel issue on Azure which should be fixed in the next kernel update.
FYI, in case you have other customers who are having the same behavior.
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1772264

system · June 18, 2018, 8:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch server lock up Elasticsearch	23	1606	July 6, 2017
Elastic node hangs or stops after few hours of load Elasticsearch	10	5086	March 14, 2018
App hangs (with es blocking requests) Elasticsearch	5	1025	July 6, 2017
Client seems to block/hang when server hangs - v0.18.7 Elasticsearch	3	298	July 6, 2017
Elastic Search hangs after start on Linux VM Elasticsearch	8	875	February 16, 2019

Elasticsearch process causes CPU soft lockup (causing the server to hung)

Related topics