First post.
History
- This is our production instance and it has been down now for 5 days
- I have inherited a system that has been running for years without issue.
- The system has not been updated, so I find it very surprising this problem has arisen.
- I have also rebuilt the system completely from scratch (bosh setup scripts) and the problem remains.
- Have never setup ES before, just learning through looking at logs
Versions
Running inside of Amazon
# ./elasticsearch --version
Version: 5.5.2, Build: b2f0c09/2017-08-14T12:33:14.154Z, JVM: 1.8.0_141
VMs
- ingestor/0
- kibana/0
- maintenance/0
- cluster_monitor/0
- elasticsearch_data/0
- elasticsearch_master/0
- ls-router/0
- elasticsearch_data/1
- elasticsearch_data/2
Details
I have tried to start the ES services on 1 vm at a time to try to understand the sequencing of events.
- Started Kibana process
elasticsearch.stdout.log
[2019-09-13T23:33:27,583][INFO ][o.e.n.Node ] [kibana/0] initialized
[2019-09-13T23:33:27,584][INFO ][o.e.n.Node ] [kibana/0] starting ...
[2019-09-13T23:33:27,856][INFO ][o.e.t.TransportService ] [kibana/0] publish_address {10.249
.1.127:9300}, bound_addresses {0.0.0.0:9300}
[2019-09-13T23:33:27,882][INFO ][o.e.b.BootstrapChecks ] [kibana/0] bound or publishing to
a non-loopback or non-link-local address, enforcing bootstrap checks
[2019-09-13T23:33:30,973][WARN ][o.e.d.z.ZenDiscovery ] [kibana/0] not enough master nodes
discovered during pinging (found [[]], but needed [1]), pinging again
# tail -f kibana.stdout.log
{"type":"log","@timestamp":"2019-09-13T23:33:54Z","tags":["warning","elasticsearch","admin"],"
pid":15331,"message":"No living connections"}
{"type":"log","@timestamp":"2019-09-13T23:33:57Z","tags":["warning","elasticsearch","admin"],"
pid":15331,"message":"Unable to revive connection: http://127.0.0.1:9200/"}
{"type":"log","@timestamp":"2019-09-13T23:33:57Z","tags":["warning","elasticsearch","admin"],"
pid":15331,"message":"No living connections"}
{"type":"log","@timestamp":"2019-09-13T23:33:59Z","tags":["status","plugin:elasticsearch@5.5.2
","error"],"pid":15331,"state":"red","message":"Status changed from red to red - Service Unava
ilable","prevState":"red","prevMsg":"Unable to connect to Elasticsearch at http://127.0.0.1:92
00."}
Ok, this makes sense as I believe the Master is responsible for starting up port 9200
Master
Start up ES on master.
Everything looks good until
[2019-09-13T23:36:28,654][INFO ][o.e.p.PluginsService ] [elasticsearch_master/0] no plugin
s loaded
[2019-09-13T23:36:28,748][WARN ][o.e.d.c.s.Setting ] [path.scripts] setting was depreca
ted in Elasticsearch and will be removed in a future release! See the breaking changes documen
tation for the next major version.
[2019-09-13T23:36:32,345][WARN ][o.e.d.c.s.Setting ] [path.scripts] setting was depreca
ted in Elasticsearch and will be removed in a future release! See the breaking changes documen
tation for the next major version.
[2019-09-13T23:36:36,475][INFO ][o.e.d.DiscoveryModule ] [elasticsearch_master/0] using dis
covery type [zen]
[2019-09-13T23:36:41,243][INFO ][o.e.n.Node ] [elasticsearch_master/0] initializ
ed
[2019-09-13T23:36:41,244][INFO ][o.e.n.Node ] [elasticsearch_master/0] starting
...
[2019-09-13T23:36:41,928][INFO ][o.e.t.TransportService ] [elasticsearch_master/0] publish_a
ddress {10.249.1.121:9300}, bound_addresses {0.0.0.0:9300}
[2019-09-13T23:36:42,018][INFO ][o.e.b.BootstrapChecks ] [elasticsearch_master/0] bound or
publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2019-09-13T23:36:45,293][INFO ][o.e.c.s.ClusterService ] [elasticsearch_master/0] new_maste
r {elasticsearch_master/0}{7teHHoZbS7C-p45hBlnuKg}{v9veDuo7SMqKf6ZWmi9esg}{10.249.1.121}{10.24
9.1.121:9300}, added {{kibana/0}{iSU1a7nRQXaJQVlMSsLbqA}{TPcWJyC-T6mI3lOVLrRRQg}{10.249.1.127}
{10.249.1.127:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{kibana/0}{iSU1a
7nRQXaJQVlMSsLbqA}{TPcWJyC-T6mI3lOVLrRRQg}{10.249.1.127}{10.249.1.127:9300}]
[2019-09-13T23:36:45,484][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch_master/0] p
ublish_address {10.249.1.121:9200}, bound_addresses {0.0.0.0:9200}
[2019-09-13T23:36:45,484][INFO ][o.e.n.Node ] [elasticsearch_master/0] started
[2019-09-13T23:36:49,939][INFO ][o.e.g.GatewayService ] [elasticsearch_master/0] recovered
[9] indices into cluster_state
[2019-09-13T23:36:52,050][WARN ][r.suppressed ] path: /_bulk, params: {}
java.lang.IllegalStateException: There are no ingest nodes in this cluster, unable to forward
request to an ingest node.
at org.elasticsearch.action.ingest.IngestActionForwarder.randomIngestNode(IngestAction
Forwarder.java:58) ~[elasticsearch-5.5.2.jar:5.5.2]
Continuing on reply to this post due to size limitations.