Cannot Discover Master Node after upgrade to ES 6.0.0

shreyask · November 16, 2017, 9:59pm

I did a rolling upgrade on our 3 node ES cluster from 5.6.3 to 6.0.0. In this process 2/3 nodes were able to discover each other and the 3rd node is still not able to discover master, and the cluster state is red since then.

Here are the settings for ES:

cluster.name: "es-at-221b"
network.host: 0.0.0.0
network.publish_host: _ec2:privateIp_
cloud.node.auto_attributes: true
discovery:
    zen:
      hosts_provider: ec2
      minimum_master_nodes: 2
    ec2:
      availability_zones: us-west-2a
      tag.system: es-at-221b-nodes
      host_type: "private_ip"
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false
bootstrap.memory_lock: false

Running on amazon-linux: Amazon Linux AMI 2017.09.0.20170930 x86_64 HVM and running inside docker container with 9200 and 9300 exposed and bound to the host.

Here are the logs:

[2017-11-16T22:16:59,063][INFO ][o.e.n.Node               ] [] initializing ...
[2017-11-16T22:16:59,142][INFO ][o.e.e.NodeEnvironment    ] [UwrqR1o] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/xvda1)]], net usable_space [100.0gb], net total_space [100.0gb], types [ext4]
[2017-11-16T22:16:59,142][INFO ][o.e.e.NodeEnvironment    ] [UwrqR1o] heap size [15.9gb], compressed ordinary object pointers [true]
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node               ] node name [UwrqR1o] derived from node ID [UwrqR1onT0K2wTs2IYxA2A]; set [node.name] to override
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node               ] version[6.0.0], pid[1], build[8f0685b/2017-11-10T18:41:22.859Z], OS[Linux/4.9.58-18.55.amzn1.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2017-11-16T22:16:59,144][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.cgroups.hierarchy.override=/, -Xms16g, -Xmx16g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [aggs-matrix-stats]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [analysis-common]
[2017-11-16T22:17:00,963][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [ingest-common]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [lang-expression]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [lang-mustache]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [lang-painless]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [parent-join]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [percolator]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [reindex]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [repository-url]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [transport-netty4]
[2017-11-16T22:17:00,964][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded module [tribe]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded plugin [discovery-ec2]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded plugin [ingest-geoip]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded plugin [ingest-user-agent]
[2017-11-16T22:17:00,965][INFO ][o.e.p.PluginsService     ] [UwrqR1o] loaded plugin [x-pack]
[2017-11-16T22:17:03,245][INFO ][o.e.d.DiscoveryModule    ] [UwrqR1o] using discovery type [zen]
[2017-11-16T22:17:03,955][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2017-11-16T22:17:03,964][WARN ][o.e.d.c.m.IndexTemplateMetaData] Deprecated field [template] used, replaced by [index_patterns]
[2017-11-16T22:17:04,107][INFO ][o.e.n.Node               ] initialized
[2017-11-16T22:17:04,107][INFO ][o.e.n.Node               ] [UwrqR1o] starting ...
[2017-11-16T22:17:04,242][INFO ][o.e.t.TransportService   ] [UwrqR1o] publish_address {xxx.xx.xx.xxx:9300}, bound_addresses {0.0.0.0:9300}
[2017-11-16T22:17:04,260][INFO ][o.e.b.BootstrapChecks    ] [UwrqR1o] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-16T22:17:07,944][WARN ][o.e.d.z.ZenDiscovery     ] [UwrqR1o] not enough master nodes discovered during pinging (found [[Candidate{node={UwrqR1o}{UwrqR1onT0K2wTs2IYxA2A}{bQ7yNXVfTiS9kqv7CZrsNQ}{xxx.xx.xx.xxx}{xxx.xx.xx.xxx:9300}{aws_availability_zone=us-west-2a}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2017-11-16T22:17:11,039][WARN ][o.e.d.z.ZenDiscovery     ] [UwrqR1o] not enough master nodes discovered during pinging (found [[Candidate{node={UwrqR1o}{UwrqR1onT0K2wTs2IYxA2A}{bQ7yNXVfTiS9kqv7CZrsNQ}{xxx.xx.xx.xxx}{xxx.xx.xx.xxx:9300}{aws_availability_zone=us-west-2a}, clusterStateVersion=-1}]], but needed [2]), pinging again

Can anyone point me in the right direction?

warkolm · November 16, 2017, 10:29pm

Did you update the discovery plugin as well?

shreyask · November 16, 2017, 10:31pm

Yes I did. It was a fresh docker image for 6.0.0 on top of which ran bin/elasticsearch-plugin install --batch discovery-ec2 to get the latest discovery plugin.

shreyask · November 16, 2017, 11:38pm

I was able to solve this issue

endpoint: ec2.us-west-2.amazonaws.com was able to help the nodes discover each other. What is weird that the other two nodes were able to discover each other without this setting. I guess the saved state on the node itself.

cluster.name: "es-at-221b"
network.host: 0.0.0.0
network.publish_host: _ec2:privateIp_
cloud.node.auto_attributes: true
discovery:
    zen:
      hosts_provider: ec2
      minimum_master_nodes: 2
    ec2:
      endpoint: ec2.us-west-2.amazonaws.com
      availability_zones: us-west-2a
      tag.system: es-at-221b-nodes
      host_type: "private_ip"
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false
bootstrap.memory_lock: false

trondhindenes · November 25, 2017, 5:51pm

I had the same problem on the first ES6 node I added to our ES 5.6 cluster as part of a rolling update. This requirement should really be documented. As far as I can see from the logs, the ES6 node doesn't even attempt to use the ec2 discovery plugin without the endpoint setting defined.

warkolm · November 25, 2017, 7:11pm

@dadoonet are you able to comment on this?

dadoonet · November 25, 2017, 7:47pm

I think that if you don’t define the endpoint, elasticsearch 5.6 should complain in the deprecated logs. So it should tell you that you need to change those settings before upgrading.

Did you see that @trondhindenes?

trondhindenes · November 25, 2017, 8:15pm

Did not check those logs, sorry. The upgrade helper did not say anything about it.
From the documentation I don't get the impression that endpoint is a required parameter - it basically says "all you need is to set zen discovery mode to ec2 and your good". Maybe a better separation between required and optional parameters would make it easier. Also, Elasticsearch normally stops/restarts if there's anything wrong with the settings, but in this case it didn't. To me it feels like there's a fairly serious bug in the ec2 discovery plugin for es6.

dadoonet · November 25, 2017, 10:51pm

The upgrade helper did not say anything about it.

Did you activate the deprecation logger?

From the documentation I don't get the impression that endpoint is a required parameter

You're right. If you are not using specific key/secret, everything should be read from the metadata instance:

Using the EC2 discovery plugin | Elasticsearch Plugins and Integrations [8.11] | Elastic

It does not say that endpoint is mandatory: https://www.elastic.co/guide/en/elasticsearch/plugins/current/_settings.html

endpoint: The ec2 service endpoint to connect to. This will be automatically figured out by the ec2 client based on the instance location, but can be specified explicitly. See AWS service endpoints - AWS General Reference.

Could you share the settings you were using in 5.6?

Thanks!

trondhindenes · November 25, 2017, 11:41pm

These are the settings we used on 5.6 without any problems:

cloud:
    aws:
        region: "{{ es_aws_region }}"
cluster.name: "{{ es_cluster_name }}"
cluster.routing.allocation.awareness.attributes: az

node.data: {{ es_data_node_enabled }}
node.name: "{{ inventory_hostname | lower }}"
path.data: "{{ es_data_path }}"
path.logs: "{{ es_logs_path }}"

network.host: _site_
network.bind_host: {{ es_internal_bind_host }}

http.port: {{ es_internal_listen_port }}
node.attr.az: {{ ec2_metadata_az.stdout }}
discovery:
    zen.hosts_provider: ec2
    ec2:
        host_type: private_ip
        groups: {{ es_disc_sg }}
        any_group: false
node.max_local_storage_nodes: 1

trondhindenes · November 25, 2017, 11:44pm

For ES6, we had to change to:

cluster.name: "{{ es_cluster_name }}"
cluster.routing.allocation.awareness.attributes: az

node.data: {{ es_data_node_enabled }}
node.name: "{{ inventory_hostname | lower }}"
path.data: "{{ es_data_path }}"
path.logs: "{{ es_logs_path }}"

network.host: _site_
network.bind_host: {{ es_internal_bind_host }}

http.port: {{ es_internal_listen_port }}
node.attr.az: {{ ec2_metadata_az.stdout }}
discovery:
    zen.hosts_provider: ec2
    ec2:
      endpoint: ec2.eu-west-1.amazonaws.com
      host_type: private_ip
      groups: {{ es_disc_sg }}
      any_group: false
node.max_local_storage_nodes: 1

So in short, we got rid of the cloud section (that was clearly documented, and the first ES6 nodes also refused to start with it in config, so that's all good). I was however surprised that the logs didn't show anything else related to discovery, my impression was that the node just skipped zen discovery altogether until I set the endpoint attribute.

Btw, we're using EC2 instance roles so no credentials are added to the config.

dadoonet · November 26, 2017, 12:06am

I think we should improve the documentation about endpoint.

As you were using region, it should have complained with 5.6 as it’s marked as deprecated:

github.com

elastic/elasticsearch/blob/5.6/plugins/discovery-ec2/src/main/java/org/elasticsearch/discovery/ec2/AwsEc2Service.java#L86-L87


Setting<String> REGION_SETTING =
    new Setting<>("cloud.aws.region", "", s -> s.toLowerCase(Locale.ROOT), Property.NodeScope, Property.Shared, Property.Deprecated);

trondhindenes · November 26, 2017, 9:20am

That would have gone in the deprecation log, right? I'm guilty. We didn't pay well enough attention to it. I was trusting the upgrade advisor too much I guess.

dadoonet · November 26, 2017, 10:12pm

@Clinton_Gormley1 Is the migration assistant supposed to detect deprecated settings including the ones defined in plugins?

system · December 24, 2017, 10:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

a5a · January 3, 2018, 3:49am

@dadoonet The Upgrade Assistant only shows cluster settings, node settings, index mappings, and machine learning settings that are deprecated. It doesn't show anything specific to plugins.

Topic		Replies	Views
Master Not Discovered (V: 6.1.0) Elasticsearch	3	843	January 23, 2018
Can't start ES 6.4 cluster with 3 master nodes Elasticsearch	7	1485	November 6, 2018
Not enough master nodes discovered during pingin on ES6.0.1 with docker Elasticsearch	2	1111	March 6, 2018
Problem restarting cluster on 6.6.1 Elasticsearch	3	438	September 28, 2019
ElasticSearch not able to discover Master nodes Elastic Stack	3	1564	November 4, 2022

Cannot Discover Master Node after upgrade to ES 6.0.0

Related topics