I did a rolling upgrade on our 3 node ES cluster from 5.6.3 to 6.0.0. In this process 2/3 nodes were able to discover each other and the 3rd node is still not able to discover master, and the cluster state is red since then.
Here are the settings for ES:
cluster.name: "es-at-221b"
network.host: 0.0.0.0
network.publish_host: _ec2:privateIp_
cloud.node.auto_attributes: true
discovery:
zen:
hosts_provider: ec2
minimum_master_nodes: 2
ec2:
availability_zones: us-west-2a
tag.system: es-at-221b-nodes
host_type: "private_ip"
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false
bootstrap.memory_lock: false
Running on amazon-linux: Amazon Linux AMI 2017.09.0.20170930 x86_64 HVM and running inside docker container with 9200 and 9300 exposed and bound to the host.
Here are the logs:
[2017-11-16T22:16:59,063][INFO ][o.e.n.Node ] [] initializing ...
[2017-11-16T22:16:59,142][INFO ][o.e.e.NodeEnvironment ] [UwrqR1o] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/xvda1)]], net usable_space [100.0gb], net total_space [100.0gb], types [ext4]
[2017-11-16T22:16:59,142][INFO ][o.e.e.NodeEnvironment ] [UwrqR1o] heap size [15.9gb], compressed ordinary object pointers [true]
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node ] node name [UwrqR1o] derived from node ID [UwrqR1onT0K2wTs2IYxA2A]; set [node.name] to override
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node ] version[6.0.0], pid[1], build[8f0685b/2017-11-10T18:41:22.859Z], OS[Linux/4.9.58-18.55.amzn1.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.cgroups.hierarchy.override=/, -Xms16g, -Xmx16g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [aggs-matrix-stats]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [analysis-common]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [ingest-common]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [lang-expression]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [lang-mustache]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [lang-painless]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [parent-join]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [percolator]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [reindex]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [repository-url]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [transport-netty4]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded module [tribe]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded plugin [discovery-ec2]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded plugin [ingest-geoip]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded plugin [ingest-user-agent]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService ] [UwrqR1o] loaded plugin [x-pack]
[2017-11-16T22:17:03,245][INFO ][o.e.d.DiscoveryModule ] [UwrqR1o] using discovery type [zen]
[2017-11-16T22:17:03,955][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2017-11-16T22:17:03,964][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2017-11-16T22:17:04,107][INFO ][o.e.n.Node ] initialized
[2017-11-16T22:17:04,107][INFO ][o.e.n.Node ] [UwrqR1o] starting ...
[2017-11-16T22:17:04,242][INFO ][o.e.t.TransportService ] [UwrqR1o] publish_address {xxx.xx.xx.xxx:9300}, bound_addresses {0.0.0.0:9300}
[2017-11-16T22:17:04,260][INFO ][o.e.b.BootstrapChecks ] [UwrqR1o] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-16T22:17:07,944][WARN ][o.e.d.z.ZenDiscovery ] [UwrqR1o] not enough master nodes discovered during pinging (found [[Candidate{node={UwrqR1o}{UwrqR1onT0K2wTs2IYxA2A}{bQ7yNXVfTiS9kqv7CZrsNQ}{xxx.xx.xx.xxx}{xxx.xx.xx.xxx:9300}{aws_availability_zone=us-west-2a}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2017-11-16T22:17:11,039][WARN ][o.e.d.z.ZenDiscovery ] [UwrqR1o] not enough master nodes discovered during pinging (found [[Candidate{node={UwrqR1o}{UwrqR1onT0K2wTs2IYxA2A}{bQ7yNXVfTiS9kqv7CZrsNQ}{xxx.xx.xx.xxx}{xxx.xx.xx.xxx:9300}{aws_availability_zone=us-west-2a}, clusterStateVersion=-1}]], but needed [2]), pinging again
Can anyone point me in the right direction?