Node seems to lock up randomly

Chris_Andreassen · December 7, 2016, 3:56pm

I have two different 5 node ES clusters. Both are running on the same VMware cluster. Every now and again the health of the cluster will go to red or even yellow. When I look at marvel i'll see that one of the nodes is no longer part of the cluster.

I'll log into the node and I can move around various directories just fine until I attempt to go into the directory we have ES installed in. For example if I attempt to just do an ls within /opt (which is where ES is installed) the system will lock up and I cannot do anything within /opt.

I am able to go into say /var/log and look at various log files. I don't see anything related to why that part of the filesystem is not accessible.

Once I reboot the system it comes back up just fine, ES, kibana and Marvel all come up and it becomes part of the cluster again. It will work for various amounts of time. Might be a week or day, It's even gone a couple of weeks without a problem.

OEL 7 3.10.0-123.el7
OpenJDK 25.101-b13
JRE 1.8.0_101-b13
ES 2.3.4
kibana 4.5.4

Each VM has 32gb of memory and 4 cpu's

warkolm · December 8, 2016, 2:20am

This sounds like a FS problem, are you using NFS mounted into the VM, or as underlying block storage that the hypervisor pulls in?

Chris_Andreassen · December 8, 2016, 2:37am

The underlying storage is iscsi, all the elasticsearch stuff sits on the hard drive presented to the OS through vmware.

I had a discussion with some of the folks on my team and we're talking about the use of xfs vs ext4. What is the recommended fs? We have another cluster in production that is using ext4 ours which is having the problems is using xfs.

warkolm · December 8, 2016, 2:49am

We don't really make recommendations there (other than stay away from NFS as a data mount the app uses directly).

system · January 5, 2017, 2:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3535	March 3, 2018
Experience with Elasticsearch on Openstack Elasticsearch	1	631	July 5, 2017
ES node remained green on VM although underlying disk failed Elasticsearch	7	456	May 16, 2019
Nodes Out of Sync Elasticsearch	7	3502	January 5, 2018
Elasticsearch 5.3.0 unstable nodes on VMWare Elasticsearch	4	857	June 12, 2017

Node seems to lock up randomly

Related topics