Node in cluster is bursted - V 2.2.0


(Nemo) #1

Hi All,
Recently one of the Node in cluster went into a bad state which I have never seen before. There are three nodes in the cluster. The cluster was in the good state for a month. I do all operations such as creating the index, deleting index on a single node. But suddenly this node got into the bad state with the following behavior.

  1. Index creation was fine through the Client. But the deleting operation was failing with indexnotfoundexception. But actually, the index was present.

  2. Creating and deleting alias was failing using the client. But through curl, I was able to create and delete the alias.

  3. I was able to see tons of below exception in elastic log

     Caused by: [indexname][[indexname][3]] CreateFailedEngineException[Create failed for [indexname#AVUnhw1tQK5-eQ90nvsB]]; nested: NoSuchFileException[/opt/data/elasticsearch/cluster/nodes/0/indices/indexname/3/index/write.lock];
    at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:367)
    at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:515)
    at org.elasticsearch.index.engine.Engine$Create.execute(Engine.java:810)
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnReplica(TransportIndexAction.java:195)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:436)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:68)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:365)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:270)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:267)
    at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    

    Caused by: java.nio.file.NoSuchFileException: /opt/data/elasticsearch/cluster/nodes/0/indices/indexname/3/index/write.lock

    [2016-06-20 12:10:42,159][WARN ][gateway ] [Node0] [indexName][0]: failed to list shard for shard_store on node [Hs_tcXRfR9OVMPNzCIV53w]
    FailedNodeException[Failed node [Hs_tcXRfR9OVMPNzCIV53w]]; nested: RemoteTransportException[[Node3][10.13.96.29:9300][internal:cluster/nodes/indices/shard/store[n]]]; nested: IllegalStateException[[indexName][0] index UUID in shard state was: hcPBOHlYTw2cLuQLgoCbqQ expected: 8foEM_sNQcaiOLf0hhByrQ on shard path: /opt/data/elasticsearch/cluster/nodes/0/indices/indexName/0];

  4. Same create and delete operation works fine through curl.

Please let me know if you need more information.


(system) #2