Does anyone encounter this kind of problem? A ruined but appeared as healthy shard caused the ES node shut down, and a memory dump is created.
The ES we used is 5.4.1. We have an index with more than 200GB data, and 12 shards. Days ago, one node with its one shard was shut down. Even we could start the node again and the cluster turnned green, after a while, the node was shut down again. A few times we tried, it always got shut down. [ During this period, a few logstash instances inserted data to the cluster, including this index. ]
At first, we guessed that the node may have heavy burden, so we tried to move the shard to another node. It's weird that the node could not be moved. The reroute command always failed. But when we moved other shard, even bigger, on this node, they always successed.
So we considered that this shard has been ruined, even it showed healthy. After we removed this index completely, the node returnned normal.
Did anyone encounter this kind of promblem before?
Any idea are appreciated.
The ES version is 5.4.1. There is no related logs about this crash. We just observed some memory dump generated at that moment.
Unfortunately, due to the system limitation, only a few lines in the dump.
Is there any possiblity that an health-like shard but with abnormal data could lead to the node crash?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.