Hi All,
Recently one of the Node in cluster went into a bad state which I have never seen before. There are three nodes in the cluster. The cluster was in the good state for a month. I do all operations such as creating the index, deleting index on a single node. But suddenly this node got into the bad state with the following behavior.
-
Index creation was fine through the Client. But the deleting operation was failing with indexnotfoundexception. But actually, the index was present.
-
Creating and deleting alias was failing using the client. But through curl, I was able to create and delete the alias.
-
I was able to see tons of below exception in elastic log
Caused by: [indexname][[indexname][3]] CreateFailedEngineException[Create failed for [indexname#AVUnhw1tQK5-eQ90nvsB]]; nested: NoSuchFileException[/opt/data/elasticsearch/cluster/nodes/0/indices/indexname/3/index/write.lock]; at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:367) at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:515) at org.elasticsearch.index.engine.Engine$Create.execute(Engine.java:810) at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnReplica(TransportIndexAction.java:195) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:436) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:68) at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:365) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:270) at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:267) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: /opt/data/elasticsearch/cluster/nodes/0/indices/indexname/3/index/write.lock
[2016-06-20 12:10:42,159][WARN ][gateway ] [Node0] [indexName][0]: failed to list shard for shard_store on node [Hs_tcXRfR9OVMPNzCIV53w]
FailedNodeException[Failed node [Hs_tcXRfR9OVMPNzCIV53w]]; nested: RemoteTransportException[[Node3][10.13.96.29:9300][internal:cluster/nodes/indices/shard/store[n]]]; nested: IllegalStateException[[indexName][0] index UUID in shard state was: hcPBOHlYTw2cLuQLgoCbqQ expected: 8foEM_sNQcaiOLf0hhByrQ on shard path: /opt/data/elasticsearch/cluster/nodes/0/indices/indexName/0]; -
Same create and delete operation works fine through curl.
Please let me know if you need more information.