Hi, we are running ES 2.0 on a 4 node cluster. ES is crashing 1-5 times in a day on nodes with different reasons. Can someone please suggest why is this happening and what is the way to get out of this situation. Crash is happening on all the 4 nodes every 6-12 hours. After crash cluster works normally, state becomes green, all shards are in STARTED state, until next crash.
Pasting one exception here.
There are other exceptions also like ShardNotFoundException, java.nio.file.NoSuchFileException, which can be found here http://pastebin.com/Y0fPULeL.
Thanks in advance.
[2016-10-06 21:05:10,347][WARN ][action.bulk ] [130591932414] [cfileindex][5] failed to perform indices:data/write/bulk[s] on node {130591932198}{E4qFPw3-TvizCAeB6ai_lw}{10.2.34.115}{10.2.34.115:25800}{master=true}
TransportException[transport stopped, action: indices:data/write/bulk[s][r]]
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:198)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-10-06 21:05:10,348][WARN ][cluster.action.shard ] [130591932414] failed to send failed shard to {130591932198}{E4qFPw3-TvizCAeB6ai_lw}{10.2.34.115}{10.2.34.115:25800}{master=true}
SendRequestTransportException[[130591932198][10.2.34.115:25800][internal:cluster/shard/failure]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:282)
at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:98)
at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:88)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase$1.handleException(TransportReplicationAction.java:895)
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:198)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 8 more
[2016-10-06 21:05:10,348][WARN ][cluster.action.shard ] [130591932414] failed to send failed shard to {130591932198}{E4qFPw3-TvizCAeB6ai_lw}{10.2.34.115}{10.2.34.115:25800}{master=true}
SendRequestTransportException[[130591932198][10.2.34.115:25800][internal:cluster/shard/failure]]; nested: TransportException[TransportService is closed stopped can't send request];