Elasticsearch 3 nodes cluster not joining with each other

Hi

I have 3 nodes cluster with below config

Node 1 ::

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.repo:

  • "/var/elasticsearch/snapshot/"
    path.data: "/var/lib/elasticsearch"
    network.host: ""
    http.port: 9200
    discovery.seed_hosts:
    -

    -**
    -**
    cluster.initial_master_nodes:
  • "node-1"
  • "node-2"
  • "node-3"

Node 2

cluster.name: "as_elasticsearch"
node.name: "node-3"
node.data: true
path.repo:

  • "/var/elasticsearch/snapshot/"
    path.data: "/var/lib/elasticsearch"
    network.host: "**
    http.port: 9200
    discovery.seed_hosts:
  • "**"
  • "**"
  • "**"
    cluster.initial_master_nodes:
  • "node-1"
  • "node-2"
  • "node-3"

Node 3

cluster.name: "as_elasticsearch"
node.name: "node-2"
node.data: true
path.repo:

  • "/var/elasticsearch/snapshot/"
    path.data: "/var/lib/elasticsearch"
    network.host: ""
    http.port: 9200
    discovery.seed_hosts:
    -

    -**
    -**
    cluster.initial_master_nodes:
  • "node-1"
  • "node-2"
  • "node-3"

I updated yml files one by one and restarted Elasticsearch. But as I see, sometimes, one or two nodes went out of the cluster (Not all the times) and created its own.

I guess since all 3 are master eligible, this could happen. I can afford only 3 node cluster. So I wanted all three to be master eligible so that atleast two node cluster work without any issue.

How to fix this intermittent issue of nodes going out?

You can configure a node to be only a data or master-eligible node which provides a nice separation of the master and data roles.

Data nodes can focus on indexing and searching over the data. Only one of the master-eligible will be the master at any given time. You can lose one of the master-eligible nodes and still have a quorum of two nodes eligible to hold an election for deciding who will be the master.

I see In your configuration only data-nodes. Check this out:

Hope this was helpfully?

I assumed , according to my configuration, all three are master eligible and data nodes. Was it the cause of the problem why nodes not joining cluster at times?

No that would not be the cause. For a three node cluster you configuration is the recommended one as there is no point to start using dedicated node types until you have a larger cluster.

Please provide the full configuration properly formatted. You have removed some data and the issue may be in those parts.

I have only this much in elasticsearch.yml.
discovery.seed_hosts has ip all of three nodes.

Node 1 ::

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.repo: "/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip1"
http.port: 9200
discovery.seed_hosts:
-ip1
-ip2
-ip3
cluster.initial_master_nodes:

  • "node-1"
  • "node-2"
  • "node-3"

Node 2

cluster.name: "as_elasticsearch"
node.name: "node-3"
node.data: true
path.repo:
"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip2"
http.port: 9200
discovery.seed_hosts:
"ip1"
"ip2"
"ip3"
cluster.initial_master_nodes:

  • "node-1"
  • "node-2"
  • "node-3"

Node 3

cluster.name: "as_elasticsearch"
node.name: "node-2"
node.data: true
path.repo:
"/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip3"
http.port: 9200
discovery.seed_hosts:
-ip1
-ip2
-ip3
cluster.initial_master_nodes:

  • "node-1"
  • "node-2"
  • "node-3"

Can someone reply please?

This is the elasticsearch log for a node not joined cluster

[2021-06-27T21:09:19,893][INFO ][o.e.e.NodeEnvironment ] [node-3] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [3.3gb], net total_space [9.7gb], types [ext4]
[2021-06-27T21:09:19,894][INFO ][o.e.e.NodeEnvironment ] [node-3] heap size [981.5mb], compressed ordinary object pointers [true]
[2021-06-27T21:09:20,037][INFO ][o.e.n.Node ] [node-3] node name [node-3], node ID [3PHdy0c4RwW8Q9A2Vqk7cg], cluster name [as_elasticsearch], roles [transform, master, remote_cluster_client, data, ml, data_content, data_hot, data_warm, data_cold, ingest]
[2021-06-27T21:09:24,925][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node-3] [controller/29756] [Main.cc@114] controller (64 bit): Version 7.10.2 (Build 40a3af639d4698) Copyright (c) 2021 Elasticsearch BV
[2021-06-27T21:09:25,738][INFO ][o.e.x.s.a.s.FileRolesStore] [node-3] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2021-06-27T21:09:27,150][INFO ][o.e.t.NettyAllocator ] [node-3] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=1mb, factors={es.unsafe.use_unpooled_allocator=null, g1gc_enabled=false, g1gc_region_size=0b, heap_size=981.5mb}]
[2021-06-27T21:09:27,232][INFO ][o.e.d.DiscoveryModule ] [node-3] using discovery type [zen] and seed hosts providers [settings]
[2021-06-27T21:09:27,852][WARN ][o.e.g.DanglingIndicesState] [node-3] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2021-06-27T21:09:28,405][INFO ][o.e.n.Node ] [node-3] initialized
[2021-06-27T21:09:28,406][INFO ][o.e.n.Node ] [node-3] starting ...
[2021-06-27T21:09:28,542][INFO ][o.e.t.TransportService ] [node-3] publish_address {ip:9300}, bound_addresses {ip:9300}
[2021-06-27T21:09:29,071][INFO ][o.e.b.BootstrapChecks ] [node-3] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-06-27T21:09:29,122][INFO ][o.e.c.c.Coordinator ] [node-3] cluster UUID [1vP_rjJ_QNKlXIphtSxo0g]
[2021-06-27T21:09:29,345][INFO ][o.e.c.s.MasterService ] [node-3] elected-as-master ([1] nodes joined)[{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20} elect leader, BECOME_MASTER_TASK, FINISH_ELECTION], term: 3, version: 3185, delta: master node changed {previous , current [{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]}
[2021-06-27T21:09:29,449][INFO ][o.e.c.s.ClusterApplierService] [node-3] master node changed {previous , current [{node-3}{3PHdy0c4RwW8Q9A2Vqk7cg}{bA3XWBi8RCCRvVYf-9oK6g}{ip}{ip:9300}{cdhilmrstw}{ml.machine_memory=3877740544, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]}, term: 3, version: 3185, reason: Publication{term=3, version=3185}
[2021-06-27T21:09:29,589][INFO ][o.e.h.AbstractHttpServerTransport] [node-3] publish_address {ip:9200}, bound_addresses {ip:9200}
[2021-06-27T21:09:29,590][INFO ][o.e.n.Node ] [node-3] started
[2021-06-27T21:09:30,365][INFO ][o.e.l.LicenseService ] [node-3] license [0c4538f5-558a-4f67-99ef-cb280738eac5] mode [basic] - valid
[2021-06-27T21:09:30,366][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [node-3] Active license is now [BASIC]; Security is disabled
[2021-06-27T21:09:30,373][INFO ][o.e.g.GatewayService ] [node-3] recovered [5] indices into cluster_state
[2021-06-27T21:10:00,758][WARN ][r.suppressed ] [node-3] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}

Sometimes in debug mode, I got below error during clustering

2021-06-27T21:11:55,063][INFO ][o.e.n.Node ] [node-1] started
[2021-06-27T21:11:55,295][DEBUG][o.e.a.s.m.TransportMasterNodeAction] [node-1] can't execute due to a cluster block, retrying
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:190) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.license.TransportGetLicenseAction.checkBlock(TransportGetLicenseAction.java:50) ~[?:?]
at org.elasticsearch.license.TransportGetLicenseAction.checkBlock(TransportGetLicenseAction.java:25) ~[?:?]

I dont know whether this could be a root cause

The config you shared looks ok but the symptoms indicate you started these nodes with a different config in the past, triggering auto-bootstrapping. See the note at the bottom of this page for more details.

I performed following steps

  1. Update first node elasticsearch yml as above(Node 1 )
  2. Restarted first node
  3. I installed second node elasticsearch which will have default config. Then I updated node 2 elasticsearch yml, deleted data dir and restarted elasticsearch
  4. I installed third node elasticsearch which will have default config. Then I updated node 3 elasticsearch yml, deleted data dir and restarted elasticsearch

Expected result is all should join single cluster. But sometimes one of the nodes not joining of the cluster

As per above doc, I have * discovery.seed_hosts and * cluster.initial_master_nodes. But I dont have * discovery.seed_providers alone. It shouldnt be a cause as I have other two

I dont get anything useful in logs too related to cluster formation

can someone please update?

Please post you config properly formatted using the tools available through the UI here. Yaml is indentation sensitive so without correct formatting it may be impossible to spot errors.

I just masked 3 ips' & shared below
I have same below config in all 3 nodes with changing seed_hosts and node-name

cluster.name: "as_elasticsearch"
node.name: "node-1"
node.data: true
path.logs: "/var/opt/novell/nam/logs/elasticsearch/"
path.repo: 
- "/var/elasticsearch/snapshot/"
path.data: "/var/lib/elasticsearch"
network.host: "ip1"
http.port: 9200
discovery.seed_hosts:
- "ip1"
- "ip2"
- "ip3"
cluster.initial_master_nodes:
- "node-1"
- "node-2"
- "node-3"

Is this details enough? or do you need more details?

If your nodes all started up with the configs you shared and an empty data path then they would not create multiple clusters. I can think of a few explanations:

  • the config you've shared isn't the config that Elasticsearch sees
  • the path that you're clearing out is not the data path that Elasticsearch is using
  • you didn't stop all the nodes as the docs I linked above instructed

Im not sure . But I can confirm you below thing

  1. Im using same config, same data directory
  2. Im having same script to delete data dir and restart elasticsearch

If I follow above 2 steps multiple times, Im able to reproduce few times. For me it sense like its an intermittent issue. Atleast can I force elasticsearch to look for a cluster? or should I need to wait for some time before I start second node, so that first node creates a cluster?

Does the script really do what the docs say? I.e. shut all the nodes down, wipe all their data directories, fix their configs and then start them all again? Your earlier message indicated you weren't doing that.

I dont know what was my earlier message that created the confusion. But Im doing as below through a java code

  1. Three nodes were running individually as default clustered nodes
  2. First node yml file updated. Data directory was not deleted here as I need the data inside this node. At this time other two nodes were running as single nodes cluster
  3. Second node yml will be updated. Deleted data directory. Started second node.
    Third node still running in default mode
  4. Third node yml updated. Deleted data directory. Started third node.

These steps Im doing through java code

This was the message I meant.

Yes, you're not doing what the docs tell you to do. You must shut down and wipe all the nodes at the same time. You're doing them one-at-a-time.

I'd recommend using snapshots to keep hold of any data you need.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.