A ruined shard caused ES node down

HelloUniverse · November 23, 2017, 1:07am

Does anyone encounter this kind of problem? A ruined but appeared as healthy shard caused the ES node shut down, and a memory dump is created.

The ES we used is 5.4.1. We have an index with more than 200GB data, and 12 shards. Days ago, one node with its one shard was shut down. Even we could start the node again and the cluster turnned green, after a while, the node was shut down again. A few times we tried, it always got shut down. [ During this period, a few logstash instances inserted data to the cluster, including this index. ]

At first, we guessed that the node may have heavy burden, so we tried to move the shard to another node. It's weird that the node could not be moved. The reroute command always failed. But when we moved other shard, even bigger, on this node, they always successed.

So we considered that this shard has been ruined, even it showed healthy. After we removed this index completely, the node returnned normal.

Did anyone encounter this kind of promblem before?
Any idea are appreciated.

dadoonet · November 23, 2017, 2:03am

Which version?
Can you share the logs before it crashes?

HelloUniverse · November 23, 2017, 6:55am

The ES version is 5.4.1. There is no related logs about this crash. We just observed some memory dump generated at that moment.
Unfortunately, due to the system limitation, only a few lines in the dump.

Is there any possiblity that an health-like shard but with abnormal data could lead to the node crash?

dadoonet · November 23, 2017, 7:09am

Is there any possiblity that an health-like shard but with abnormal data could lead to the node crash?

Well. Not on purpose. @jasontedor does this remind you anything?

The ES version is 5.4.1.

Could upgrade to latest version?

What is your exact JVM version? java -version

HelloUniverse · November 23, 2017, 8:39am

1.8.0_66-b17

dadoonet · November 23, 2017, 9:14am

Can you upgrade to latest JVM version? For example:

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

HelloUniverse · November 23, 2017, 9:21am

Thanks for your suggestion. And sure, we could do that.

Is there any bug reported related to old jdk?

dadoonet · November 23, 2017, 9:36am

Officially we do support Oracle JVM 1.8u60+ and IcedTea OpenJDK 1.8.0.111+.

See https://www.elastic.co/support/matrix#matrix_jvm

What is your vendor?

HelloUniverse · November 24, 2017, 1:24am

We are using Oracle JVM. In most cases, the ES works excellently.
Later, as you suggested, we plan upgrading to latest JDK 1.8.

jasontedor · November 24, 2017, 2:46am

Sorry, it does not. The description is lacking sufficient detail (logs, error messages, etc.) for us to triage this one.

system · December 22, 2017, 2:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Did ES just nuke my index? Elasticsearch	4	435	July 6, 2017
Data loss with 0.19.8 Elasticsearch	3	636	July 6, 2017
ES fell down and it can't get up Elasticsearch	4	1152	July 6, 2017
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3538	March 3, 2018
Elasticsearch 2.4 node changed after java heap crashed Elasticsearch	5	422	August 9, 2018

A ruined shard caused ES node down

Related topics