Elasticsearch 3 nodes cluster not joining with each other

RAM_NATHAN · June 25, 2021, 12:53pm

Hi

I have 3 nodes cluster with below config

Node 1 ::

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.repo:

"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: ""
http.port: 9200
discovery.seed_hosts:
-
-**
-**
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

Node 2

cluster.name: "as_elasticsearch"
node.name: "node-3"
node.data: true
path.repo:

"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "**
http.port: 9200
discovery.seed_hosts:

"**"

"**"

"**"
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

Node 3

cluster.name: "as_elasticsearch"
node.name: "node-2"
node.data: true
path.repo:

"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: ""
http.port: 9200
discovery.seed_hosts:
-
-**
-**
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

I updated yml files one by one and restarted Elasticsearch. But as I see, sometimes, one or two nodes went out of the cluster (Not all the times) and created its own.

I guess since all 3 are master eligible, this could happen. I can afford only 3 node cluster. So I wanted all three to be master eligible so that atleast two node cluster work without any issue.

How to fix this intermittent issue of nodes going out?

avehabov · June 25, 2021, 1:49pm

You can configure a node to be only a data or master-eligible node which provides a nice separation of the master and data roles.

Data nodes can focus on indexing and searching over the data. Only one of the master-eligible will be the master at any given time. You can lose one of the master-eligible nodes and still have a quorum of two nodes eligible to hold an election for deciding who will be the master.

I see In your configuration only data-nodes. Check this out:

Hope this was helpfully?

RAM_NATHAN · June 25, 2021, 1:51pm

I assumed , according to my configuration, all three are master eligible and data nodes. Was it the cause of the problem why nodes not joining cluster at times?

Christian_Dahlqvist · June 25, 2021, 2:11pm

No that would not be the cause. For a three node cluster you configuration is the recommended one as there is no point to start using dedicated node types until you have a larger cluster.

Please provide the full configuration properly formatted. You have removed some data and the issue may be in those parts.

RAM_NATHAN · June 25, 2021, 2:44pm

I have only this much in elasticsearch.yml.
discovery.seed_hosts has ip all of three nodes.

Node 1 ::

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.repo: "/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip1"
http.port: 9200
discovery.seed_hosts:
-ip1
-ip2
-ip3
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

Node 2

cluster.name: "as_elasticsearch"
node.name: "node-3"
node.data: true
path.repo:
"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip2"
http.port: 9200
discovery.seed_hosts:
"ip1"
"ip2"
"ip3"
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

Node 3

cluster.name: "as_elasticsearch"
node.name: "node-2"
node.data: true
path.repo:
"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip3"
http.port: 9200
discovery.seed_hosts:
-ip1
-ip2
-ip3
cluster.initial_master_nodes:

"node-1"

"node-2"

"node-3"

RAM_NATHAN · June 27, 2021, 3:45pm

Can someone reply please?

This is the elasticsearch log for a node not joined cluster

[2021-06-27T21:09:19,893][INFO ][o.e.e.NodeEnvironment ] [node-3] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [3.3gb], net total_space [9.7gb], types [ext4]
[2021-06-27T21:09:19,894][INFO ][o.e.e.NodeEnvironment ] [node-3] heap size [981.5mb], compressed ordinary object pointers [true]
[2021-06-27T21:09:20,037][INFO ][o.e.n.Node ] [node-3] node name [node-3], node ID [3PHdy0c4RwW8Q9A2Vqk7cg], cluster name [as_elasticsearch], roles [transform, master, remote_cluster_client, data, ml, data_content, data_hot, data_warm, data_cold, ingest]
[2021-06-27T21:09:24,925][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node-3] [controller/29756] [Main.cc@114] controller (64 bit): Version 7.10.2 (Build 40a3af639d4698) Copyright (c) 2021 Elasticsearch BV
[2021-06-27T21:09:25,738][INFO ][o.e.x.s.a.s.FileRolesStore] [node-3] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2021-06-27T21:09:27,150][INFO ][o.e.t.NettyAllocator ] [node-3] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=1mb, factors={es.unsafe.use_unpooled_allocator=null, g1gc_enabled=false, g1gc_region_size=0b, heap_size=981.5mb}]
[2021-06-27T21:09:27,232][INFO ][o.e.d.DiscoveryModule ] [node-3] using discovery type [zen] and seed hosts providers [settings]
[2021-06-27T21:09:27,852][WARN ][o.e.g.DanglingIndicesState] [node-3] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2021-06-27T21:09:28,405][INFO ][o.e.n.Node ] [node-3] initialized
[2021-06-27T21:09:28,406][INFO ][o.e.n.Node ] [node-3] starting ...
[2021-06-27T21:09:28,542][INFO ][o.e.t.TransportService ] [node-3] publish_address {ip:9300}, bound_addresses {ip:9300}
[2021-06-27T21:09:29,071][INFO ][o.e.b.BootstrapChecks ] [node-3] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-06-27T21:09:29,122][INFO ][o.e.c.c.Coordinator ] [node-3] cluster UUID [1vP_rjJ_QNKlXIphtSxo0g]
[2021-06-27T21:09:29,345][INFO ][o.e.c.s.MasterService ] [node-3] elected-as-master ([1] nodes joined)[{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20} elect leader, BECOME_MASTER_TASK, FINISH_ELECTION], term: 3, version: 3185, delta: master node changed {previous , current [{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]}
[2021-06-27T21:09:29,449][INFO ][o.e.c.s.ClusterApplierService] [node-3] master node changed {previous , current [{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]}, term: 3, version: 3185, reason: Publication{term=3, version=3185}
[2021-06-27T21:09:29,589][INFO ][o.e.h.AbstractHttpServerTransport] [node-3] publish_address {ip:9200}, bound_addresses {ip:9200}
[2021-06-27T21:09:29,590][INFO ][o.e.n.Node ] [node-3] started
[2021-06-27T21:09:30,365][INFO ][o.e.l.LicenseService ] [node-3] license [0c4538f5-558a-4f67-99ef-cb280738eac5] mode [basic] - valid
[2021-06-27T21:09:30,366][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [node-3] Active license is now [BASIC]; Security is disabled
[2021-06-27T21:09:30,373][INFO ][o.e.g.GatewayService ] [node-3] recovered [5] indices into cluster_state
[2021-06-27T21:10:00,758][WARN ][r.suppressed ] [node-3] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}

Sometimes in debug mode, I got below error during clustering

2021-06-27T21:11:55,063][INFO ][o.e.n.Node ] [node-1] started
[2021-06-27T21:11:55,295][DEBUG][o.e.a.s.m.TransportMasterNodeAction] [node-1] can't execute due to a cluster block, retrying
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:190) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.license.TransportGetLicenseAction.checkBlock(TransportGetLicenseAction.java:50) ~[?:?]
at org.elasticsearch.license.TransportGetLicenseAction.checkBlock(TransportGetLicenseAction.java:25) ~[?:?]

I dont know whether this could be a root cause

DavidTurner · June 27, 2021, 8:34pm

The config you shared looks ok but the symptoms indicate you started these nodes with a different config in the past, triggering auto-bootstrapping. See the note at the bottom of this page for more details.

RAM_NATHAN · June 28, 2021, 4:18am

I performed following steps

Update first node elasticsearch yml as above(Node 1 )
Restarted first node
I installed second node elasticsearch which will have default config. Then I updated node 2 elasticsearch yml, deleted data dir and restarted elasticsearch
I installed third node elasticsearch which will have default config. Then I updated node 3 elasticsearch yml, deleted data dir and restarted elasticsearch

Expected result is all should join single cluster. But sometimes one of the nodes not joining of the cluster

As per above doc, I have * discovery.seed_hosts and * cluster.initial_master_nodes. But I dont have * discovery.seed_providers alone. It shouldnt be a cause as I have other two

I dont get anything useful in logs too related to cluster formation

RAM_NATHAN · June 30, 2021, 7:44am

can someone please update?

Christian_Dahlqvist · June 30, 2021, 7:56am

Please post you config properly formatted using the tools available through the UI here. Yaml is indentation sensitive so without correct formatting it may be impossible to spot errors.

RAM_NATHAN · June 30, 2021, 8:39am

I just masked 3 ips' & shared below
I have same below config in all 3 nodes with changing seed_hosts and node-name

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.logs: "/var/opt/novell/nam/logs/elasticsearch/"
path.repo: 
- "/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip1"
http.port: 9200
discovery.seed_hosts:
- "ip1"
- "ip2"
- "ip3"
cluster.initial_master_nodes:
- "node-1"
- "node-2"
- "node-3"

RAM_NATHAN · July 1, 2021, 7:55am

Is this details enough? or do you need more details?

DavidTurner · July 1, 2021, 8:55am

If your nodes all started up with the configs you shared and an empty data path then they would not create multiple clusters. I can think of a few explanations:

the config you've shared isn't the config that Elasticsearch sees
the path that you're clearing out is not the data path that Elasticsearch is using
you didn't stop all the nodes as the docs I linked above instructed

RAM_NATHAN · July 2, 2021, 5:41am

Im not sure . But I can confirm you below thing

Im using same config, same data directory
Im having same script to delete data dir and restart elasticsearch

If I follow above 2 steps multiple times, Im able to reproduce few times. For me it sense like its an intermittent issue. Atleast can I force elasticsearch to look for a cluster? or should I need to wait for some time before I start second node, so that first node creates a cluster?

DavidTurner · July 2, 2021, 8:51am

Does the script really do what the docs say? I.e. shut all the nodes down, wipe all their data directories, fix their configs and then start them all again? Your earlier message indicated you weren't doing that.

RAM_NATHAN · July 2, 2021, 12:11pm

I dont know what was my earlier message that created the confusion. But Im doing as below through a java code

Three nodes were running individually as default clustered nodes
First node yml file updated. Data directory was not deleted here as I need the data inside this node. At this time other two nodes were running as single nodes cluster
Second node yml will be updated. Deleted data directory. Started second node.
Third node still running in default mode
Third node yml updated. Deleted data directory. Started third node.

These steps Im doing through java code

DavidTurner · July 2, 2021, 3:04pm

This was the message I meant.

Yes, you're not doing what the docs tell you to do. You must shut down and wipe all the nodes at the same time. You're doing them one-at-a-time.

I'd recommend using snapshots to keep hold of any data you need.

system · July 30, 2021, 3:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to join nodes to cluster Elasticsearch	6	787	September 10, 2020
Cluster Setup 3 Node Cluster problem Elasticsearch	48	2011	August 12, 2019
Nodes are not joining cluster Elasticsearch	3	1659	September 23, 2019
Node not joining Elasticsearch cluster Elasticsearch	9	607	June 9, 2021
Data node can't find the master on elasticsearch cluster Elasticsearch	2	1845	November 30, 2020

Elasticsearch 3 nodes cluster not joining with each other

Related topics