this is the output when all the nodes are running:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.22.107.20 2 54 9 1.48 0.68 0.33 mdi - 172.22.107.20:10000
172.22.107.22 3 65 9 0.15 0.20 0.17 mdi - 172.22.107.22:10000
172.22.107.21 2 54 11 0.30 0.26 0.14 mdi * 172.22.107.21:10000
and this is the output when master is killed:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.22.107.20 1 54 5 0.88 0.69 0.36 mdi - 172.22.107.20:10000
172.22.107.22 2 65 17 0.37 0.22 0.18 mdi * 172.22.107.22:10000
the node respond to the command GET /_cat/node_s?v
but no response to
curl -XGET localhost:9200/_cluster/health?pretty
Also after restarting the killed master, it never join the cluster. This is the log for the killed master after restarting:
[2018-06-07T09:56:40,416][INFO ][o.e.d.DiscoveryModule ] [172.22.107.21:10000] using discovery type [zen]
[2018-06-07T09:56:41,644][INFO ][o.e.n.Node ] [172.22.107.21:10000] initialized
[2018-06-07T09:56:41,644][INFO ][o.e.n.Node ] [172.22.107.21:10000] starting ...
[2018-06-07T09:56:41,835][INFO ][o.e.t.TransportService ] [172.22.107.21:10000] publish_address {172.22.107.21:9300}, bound_addresses {[::]:9300}
[2018-06-07T09:56:41,859][INFO ][o.e.b.BootstrapChecks ] [172.22.107.21:10000] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-06-07T09:56:45,984][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:57:11,907][WARN ][o.e.n.Node ] [172.22.107.21:10000] timed out while waiting for initial discovery state - timeout: 30s
[2018-06-07T09:57:11,920][INFO ][o.e.h.n.Netty4HttpServerTransport] [172.22.107.21:10000] publish_address {172.22.107.21:9200}, bound_addresses {[::]:9200}
[2018-06-07T09:57:11,920][INFO ][o.e.n.Node ] [172.22.107.21:10000] started
[2018-06-07T09:57:15,988][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-06-07T09:57:16,027][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:57:45,018][INFO ][o.e.d.z.ZenDiscovery ] [172.22.107.21:10000] failed to send join request to master [{172.22.107.22:10000}{ds3DRQbwR2qQg7S9x-ljfw}{CTegKdEiQh2i2olrKdibrg}{172.22.107.22}{172.22.107.22:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-06-07T09:57:46,029][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] timed out while retrying [indices:admin/exists] after failure (timeout [30s])
[2018-06-07T09:57:51,042][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:58:10,074][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:58:21,044][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-06-07T09:58:21,058][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:58:40,076][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-06-07T09:58:40,080][WARN ][r.suppressed ] path: /_cluster/health, params: {pretty=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.7.jar:5.6.7]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.7.jar:5.6.7]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
[2018-06-07T09:58:48,030][INFO ][o.e.d.z.ZenDiscovery ] [172.22.107.21:10000] failed to send join request to master [{172.22.107.22:10000}{ds3DRQbwR2qQg7S9x-ljfw}{CTegKdEiQh2i2olrKdibrg}{172.22.107.22}{172.22.107.22:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-06-07T09:58:51,059][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] timed out while retrying [indices:admin/exists] after failure (timeout [30s])
[2018-06-07T09:58:56,082][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:59:26,083][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-06-07T09:59:26,093][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] no known master node, scheduling a retry
[2018-06-07T09:59:51,040][INFO ][o.e.d.z.ZenDiscovery ] [172.22.107.21:10000] failed to send join request to master [{172.22.107.22:10000}{ds3DRQbwR2qQg7S9x-ljfw}{CTegKdEiQh2i2olrKdibrg}{172.22.107.22}{172.22.107.22:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-06-07T09:59:56,094][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [172.22.107.21:10000] timed out while retrying [indices:admin/exists] after failure (timeout [30s])
[2018-06-07T10:00:01,109][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [172.22.107.21:10000] no known master node, scheduling a retry