Nodes not communicating in 3-node cluster

Hello,

I have deployed a 3 node Elasticsearch 7.0 Cluster from the Azure marketplace. However, I'm having trouble configuring them to see each other. Below I have detailed the elasticsearch.yml for 2 of the nodes.

Node 1 is called esp-data-0 (192.248.16.7). Here is the elasticsearch.yml

cluster.name: "elasticsearch"
node.name: "esp-data-0"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
cluster.initial_master_nodes: ["esp-data-0","esp-data-1","esp-data-2"]
discovery.seed_hosts: ["192.248.16.7","192.248.16.6","192.248.16.8"]
network.bind_host: "192.248.16.7"
network.publish_host: "192.248.16.7"
transport.port: "9300"
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: ["192.248.16.7",_local_]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 1
node.attr.update_domain: 1
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: false
bootstrap.memory_lock: true

Nodes 2 and 3 also have master and data set to true. See node 2 below:

cluster.name: "elasticsearch"
node.name: "esp-data-1"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
cluster.initial_master_nodes: ["esp-data-0","esp-data-1","esp-data-2"]
discovery.seed_hosts: ["192.248.16.7","192.248.16.6","192.248.16.8"]
network.bind_host: "192.248.16.6"
network.publish_host: "192.248.16.6"
transport.port: "9300"
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: ["192.248.16.6",_local_]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 0
node.attr.update_domain: 0
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: false
bootstrap.memory_lock: true

On all three nodes, when I run the below to view the cluster topology, they only appear to have sight of themselves and not the other 2 nodes in the cluster... I CAN however, telnet successfully from each node to the other on ports 9200/9300 so I believe the issue must reside in my Elasticsearch congig.

telnet 192.248.16.6 9200
Trying 192.248.16.6...
Connected to 192.248.16.6.
telnet 192.248.16.6 9300
Trying 192.248.16.6...
Connected to 192.248.16.6.

Can you please suggest why this may be?

Result of running nodes command on node 1:
curl 192.248.16.7:9200/_cat/nodes?v

ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.248.16.7            3          68   0    0.00    0.02     0.00 mdi       *      esp-data-0

Result of running nodes command on node 2:
curl 192.248.16.6:9200/_cat/nodes?v

ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.248.16.6            3          68   0    0.00    0.02     0.00 mdi       *      esp-data-1

Hi,

Try change cluster.initial_master_nodes: ["esp-data-0","esp-data-1","esp-data-2"] to cluster.initial_master_nodes: ["192.248.16.7","192.248.16.6","192.248.16.8"].

Please provide the Elasticsearch log output to better understand the problem.

Best Regards,

I think what @Paul_14 has in their config looks correct. I'd recommend using node names instead of IP addresses in this setting.

However I do agree that the node logs are what we need to see to make further progress.

Hi David & Cristiano,
I made the change suggested by Cristiano, then restarted Elasticsearch. It didn't resolve the issue unfortunately... I reverted the change, and restarted again... The log output on restart is the same whether I use IP address or Node name for cluster.initial_master_nodes. Here is the log output after restarting:

logs when using IP address instead of node name:

[2019-06-12T12:11:54,553][INFO ][o.e.e.NodeEnvironment    ] [esp-data-0] using [1] data paths, mounts [[/datadisks/disk1 (/dev/sdc1)]], net usable_space [478.1gb], net total_space [503.8gb], types [ext4]
[2019-06-12T12:11:54,558][INFO ][o.e.e.NodeEnvironment    ] [esp-data-0] heap size [7.7gb], compressed ordinary object pointers [true]
[2019-06-12T12:11:54,589][INFO ][o.e.n.Node               ] [esp-data-0] node name [esp-data-0], node ID [IzZPSNlbSO2mNwtuPGSoOw]
[2019-06-12T12:11:54,589][INFO ][o.e.n.Node               ] [esp-data-0] version[7.0.0], pid[13269], build[default/deb/b7e28a7/2019-04-05T22:55:32.697037Z], OS[Linux/4.15.0-1045-azure/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/12/12+33]
[2019-06-12T12:11:54,590][INFO ][o.e.n.Node               ] [esp-data-0] JVM home [/usr/share/elasticsearch/jdk]
[2019-06-12T12:11:54,590][INFO ][o.e.n.Node               ] [esp-data-0] JVM arguments [-Xms8009m, -Xmx8009m, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-6877794265839147032, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -Dio.netty.allocator.type=pooled, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb, -Des.bundled_jdk=true]
[2019-06-12T12:11:56,587][INFO ][o.e.p.PluginsService     ] [esp-data-0] loaded module [aggs-matrix-stats]

**--- Loaded modules --- (removed this from logs to save space)**

[2019-06-12T12:11:56,608][INFO ][o.e.p.PluginsService     ] [esp-data-0] no plugins loaded
[2019-06-12T12:12:01,337][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [esp-data-0] [controller/13373] [Main.cc@109] controller (64 bit): Version 7.0.0 (Build cdaa022645f38d) Copyright (c) 2019 Elasticsearch BV
[2019-06-12T12:12:02,244][INFO ][o.e.d.DiscoveryModule    ] [esp-data-0] using discovery type [zen] and seed hosts providers [settings]
[2019-06-12T12:12:03,355][INFO ][o.e.n.Node               ] [esp-data-0] initialized
[2019-06-12T12:12:03,355][INFO ][o.e.n.Node               ] [esp-data-0] starting ...
[2019-06-12T12:12:03,495][INFO ][o.e.t.TransportService   ] [esp-data-0] publish_address {192.248.16.7:9300}, bound_addresses {192.248.16.7:9300}
[2019-06-12T12:12:03,503][INFO ][o.e.b.BootstrapChecks    ] [esp-data-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-06-12T12:12:03,756][INFO ][o.e.c.s.MasterService    ] [esp-data-0] elected-as-master ([1] nodes joined)[{esp-data-0}{IzZPSNlbSO2mNwtuPGSoOw}{FYG_9me0SWOfmDE2RtxG9w}{192.248.16.7}{192.248.16.7:9300}{ml.machine_memory=16796557312, xpack.installed=true, update_domain=1, ml.max_open_jobs=20, fault_domain=1} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 15, version: 108, reason: master node changed {previous [], current [{esp-data-0}{IzZPSNlbSO2mNwtuPGSoOw}{FYG_9me0SWOfmDE2RtxG9w}{192.248.16.7}{192.248.16.7:9300}{ml.machine_memory=16796557312, xpack.installed=true, update_domain=1, ml.max_open_jobs=20, fault_domain=1}]}
[2019-06-12T12:12:03,977][INFO ][o.e.c.s.ClusterApplierService] [esp-data-0] master node changed {previous [], current [{esp-data-0}{IzZPSNlbSO2mNwtuPGSoOw}{FYG_9me0SWOfmDE2RtxG9w}{192.248.16.7}{192.248.16.7:9300}{ml.machine_memory=16796557312, xpack.installed=true, update_domain=1, ml.max_open_jobs=20, fault_domain=1}]}, term: 15, version: 108, reason: Publication{term=15, version=108}
[2019-06-12T12:12:04,033][INFO ][o.e.h.AbstractHttpServerTransport] [esp-data-0] publish_address {192.248.16.7:9200}, bound_addresses {192.248.16.7:9200}
[2019-06-12T12:12:04,033][INFO ][o.e.n.Node               ] [esp-data-0] started
[2019-06-12T12:12:04,286][INFO ][o.e.l.LicenseService     ] [esp-data-0] license [786904f8-9db2-4771-a652-d920bd8db4ec] mode [trial] - valid
[2019-06-12T12:12:04,295][INFO ][o.e.g.GatewayService     ] [esp-data-0] recovered [6] indices into cluster_state
[2019-06-12T12:12:05,046][INFO ][o.e.c.r.a.AllocationService] [esp-data-0] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.kibana_task_manager][0], [.kibana_1][0]] ...]).

I think the first time you started these nodes up you did not have cluster.initial_master_nodes set, so this note in the manual applies to you, along with the advice to diagnose it for sure (each node reports a different cluster UUID) and the remedy (wipe the data and start again).

If that's not the case then let us know and we'll dig deeper.

1 Like

Thanks very much David. This resolved the issue.

> curl 192.248.16.8:9200/_cat/nodes?v
> ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
> 192.248.16.6            2          67   0    0.10    0.18     0.10 mdi       *      esp-data-1
> 192.248.16.8            3          68   0    0.06    0.13     0.08 mdi       -      esp-data-2
> 192.248.16.7            2          68   0    0.04    0.13     0.08 mdi       -      esp-data-0

I had actually come across a similar instruction previously, however, when I went to /var/elasticsearch/data it was empty... It turns out that when deployed from Azure, by default path.data is set to /datadisks/disk1/elasticsearch/data. Deleting the data here then restarting resolved the problem.:grinning:

Cheers,
Paul

1 Like