Transport client node failure

(shengcer) #1

If the elastic search is configured to run on two nodes, both are of type data/master. I then write my program to initialize a transport client to listen to both of these two nodes. For some reason, either due to network is slow or the node itself is dead, anyway one node is failed. Meanwhile elasticsearch is executing a scheduled job of indexing a great amount of data to the cluster. The transport client, in this case, would of course complain one node is dead. Now what I am really concerned is the whole cluster would be messed up. Below is one sample of the messages I got in this case. What can I do to avoid this from happening?

WARNING: [Blackout] [coverage-elastic1345266122391][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [coverage-elastic1345266122391][0] shard allocated for local recovery (post api), should exists, but doesn't
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
at org.elasticsearch.index.gateway.IndexShardGatewayService$
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
at java.util.concurrent.ThreadPoolExecutor$

(system) #2