ES Stops Works Randomly


I have 2 ES servers that are being fed by 1 logstash server and viewing the
logs in Kibana. This is a POC to work out any issues before going into
production. The system has ran for ~1 month and every few days, Kibana will
stop showing logs at some random time in the middle of the night. Last
night, the last log entry I received in Kibana was around 18:30. When I
checked on the ES servers, it showed the master running and the secondary
not running (from /sbin/service elasticsearch status), but I was able to do
a curl on the localhost and it returned information. So not sure what's up
with that. Anyway, when I do a status on the master node, I get this:

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
"cluster_name" : "gis-elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 2,
"active_primary_shards" : 186,
"active_shards" : 194,
"relocating_shards" : 0,
"initializing_shards" : 7,
"unassigned_shards" : 249

When I view the indexes, via "ls ...nodes/0/indeces/" it shows all indexes
being modified today for some reason and there are new file for today's
date.So I think I'm starting to catch back up after I restarted both
servers but not sure why it failed in the first place. When I look at the
logs on the master, I only see 4 warning errors at 18:57 and then the
2ndary leaving the cluster. I don't see any logs on the secondary (Pistol)
on why it stopped working or what truly happened.

[2014-03-06 18:57:04,121][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147630]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147717]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147718]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147721]

[2014-03-06 19:56:08,467][INFO ][cluster.service ] [ElasticSearch
Server1] removed
data=false},}, reason:
data=false}), reason failed to ping, tried [3] times, each with maximum
[30s] timeout
[2014-03-06 19:56:12,304][INFO ][cluster.service ] [ElasticSearch
Server1] added
data=false},}, reason: zen-disco-receive(join from

Any idea on additional logging or troubleshooting I can turn on to keep
this from happening in the future? Since the shards are not caught up,
right now I"m just seeing a lot o debug messages about failed to parse. I'm
assuming that will be corrected once we catch up.

[2014-03-07 10:06:52,235][DEBUG][ ] [ElasticSearch
Server1] All shards failed for phase: [query]
[2014-03-07 10:06:52,223][DEBUG][ ] [ElasticSearch
Server1] [windows-2014.03.07][3], node[W6aEFbimR5G712ddG_G5yQ], [P],
s[STARTED]: Failed to execute
[] lastShard [true] [windows-2014.03.07][3]:
from[-1],size[-1]: Parse Failure [Failed to parse source

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit