Joining node to cluster - what am I doing wrong?

We have an existing cluster consisting of a single node. We would like to add another node to the cluster. Both nodes are running Elasticsearch 8.13.

We followed the instructions in here, and additionally here.

After configuring elasticsearch.yml on each node, I restart the service using sudo systemctl restart elasticsearch.

Here are the elasticsearch.yml files for the existing node ('zeek1' and the new node 'zeek2'):

Existing node:

node.name: zeek1
path.data: /mnt/Bro/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [zeek1 IP]


xpack.security.enabled: true

xpack.security.enrollment.enabled: true

xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
cluster.initial_master_nodes: ["zeek1"]

http.host: 0.0.0.0

transport.host: 0.0.0.0

New node:

node.name: zeek2
path.data: /mnt/Bro/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [zeek2 IP]
discovery.seed_hosts:
  - [zeek1 IP]
cluster.initial_master_nodes:
  - zeek1


xpack.security.enabled: true

xpack.security.enrollment.enabled: true

xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

xpack.security.transport.ssl:node.name: zeek1
path.data: /mnt/Bro/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [zeek1 IP]


xpack.security.enabled: true

xpack.security.enrollment.enabled: true

xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
cluster.initial_master_nodes: ["zeek1"]

http.host: 0.0.0.0

transport.host: 0.0.0.0
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
discovery.seed_hosts: ["zeek1 IP"]

http.host: 0.0.0.0

transport.host: 0.0.0.0

The elasticsearch.log file has many entries like this:

[2024-04-04T15:12:26,111][WARN ][o.e.c.c.ClusterFormationFailureHelper] [zeek2] master not discovered yet, this node has not previously joined a bootstrapped cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{rel14QqhSVqIBDMuUO_Vbw}{zeek2}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000}]; discovery will continue using [] from hosts providers and [{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{rel14QqhSVqIBDMuUO_Vbw}{zeek2}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000}] from last-known cluster state; node term 0, last-accepted version 0 in term 0; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/8.13/discovery-troubleshooting.html

However, the most recent entries in this log were from a few hours before the most recent changes to elasticsearch.yml. This confuses me!

Both nodes are on Ubuntu 22.06. Both nodes are on the same broadcast domain, and have full connectivity between them. I have disabled the ufw for testing purposes.

What am I doing wrong?

It doesn't look like zeek2 is really using that config file. I would double check that you've edited the right file and/or that you're pointing the new node at the right config directory.

Also you must not use cluster.initial_master_nodes when adding a node to a cluster.

The new node's config file has some duplicate entries, network.host, path.data and path.logs, and xpack.security.transport.ssl:node.name: zeek1 looks incorrect. I expect Elasticsearch is reporting all these problems in its startup messages.

Thanks for your response Tim.

I verified that I'm editing the correct file by checking /etc/default/elasticsearch; it includes ES_PATH_CONF=/etc/elasticsearch

One odd thing: when I try echo ${ES_PATH_CONF} it returns nothing.

Additionally, I tried:

sudo find / -name elasticsearch.yml
/etc/elasticsearch/elasticsearch.yml

Is there's some other way to confirm I'm editing the right config file?

Thanks David. I had made an error pasting in the node2 elasticsearch.yml config. It should be:

node.name: zeek2
path.data: /mnt/Bro/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [zeek2 IP]
discovery.seed_hosts:
  - [zeek1 IP]


xpack.security.enabled: true

xpack.security.enrollment.enabled: true

xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
discovery.seed_hosts: ["[zeek1 IP]"]

http.host: 0.0.0.0

transport.host: 0.0.0.0

still duplicating discovery.seed_hosts, which is fatal to ES

Aha yes, thank you; I did notice that in /var/logs/syslog.

When I corrected this, I saw in elasticsearch.log an error about cluster names not matching. I fixed this and now the service starts without issue.

However, in the web interface I still only see one node in the cluster, and when I ssue curl -XGET 'localhost:9200/_cluster/health?pretty' curl: (52) Empty reply from server . There are no relevant log entries in elasticsearch.log on either node.

Interesting - it looks like something is happening. The storage directory on the new node is starting to be populated with indices, and via tcptrack I can see several TCP connections between the two nodes. I just wish there was a better way to see what's going on.

How are you deciding what is or isn't relevant? Probably best to share everything you're seeing.

Aha, I'm learning here. I just discovered that in the log directory, there's a log with the same name as the cluster. I had been looking at the wrong log. This makes sense.

In node 1, I see:

[2024-04-05T09:35:02,332][INFO ][o.e.c.r.a.AllocationService] [zeek1] updating number_of_replicas to [1] for indices [.internal.alerts-observability.slo.alerts-default-000001, .ds-ilm-history-5-2024.02.07-000010, .security-profile-8, .kibana-observability-ai-assistant-conversations-000001, .kibana_alerting_cases_8.9.0_001, .internal.alerts-observability.metrics.alerts-default-000001, .transform-internal-007, metrics-endpoint.metadata_current_default, .transform-notifications-000002, .monitoring-es-7-2024.04.02, .apm-agent-configuration, .kibana_security_session_1, .ds-.logs-deprecation.elasticsearch-default-2024.02.16-000009, .monitoring-es-7-2024.04.01, .kibana_8.9.0_001, .monitoring-es-7-2024.04.04, .ds-.kibana-event-log-8.9.1-2023.10.24-000003, .ds-.monitoring-es-8-mb-2024.01.18-000063, .monitoring-kibana-7-2024.04.05, .slo-observability.sli-v2, .monitoring-es-7-2024.04.03, .monitoring-es-7-2024.03.31, .tasks, .apm-custom-link, .kibana_ingest_8.9.0_001, .ds-.kibana-event-log-8.10.3-2024.04.03-000005, .monitoring-kibana-7-2024.04.03, .monitoring-kibana-7-2024.04.02, .apm-source-map, .internal.alerts-observability.apm.alerts-default-000001, .geoip_databases, .ds-ilm-history-5-2023.12.09-000006, .kibana_security_solution_8.9.0_001, .ds-ilm-history-5-2024.03.08-000012, .security-7, .internal.alerts-stack.alerts-default-000001, .ds-ilm-history-5-2024.01.08-000007, .internal.alerts-security.alerts-default-000002, .ds-.kibana-event-log-8.9.0-2023.09.10-000002, .kibana_analytics_8.9.0_001, .internal.alerts-observability.uptime.alerts-default-000001, .monitoring-kibana-7-2024.03.31, .monitoring-es-7-2024.03.30, .ds-.kibana-event-log-8.10.3-2023.11.10-000002, .slo-observability.summary-v2, .internal.alerts-security.alerts-default-000001, .monitoring-kibana-7-2024.04.04, .slo-observability.summary-v2.temp, .internal.alerts-observability.logs.alerts-default-000001, .ds-.kibana-event-log-8.10.3-2024.01.17-000003, .metrics-endpoint.metadata_united_default, .monitoring-es-7-2024.04.05, .kibana_task_manager_8.9.0_001, .monitoring-kibana-7-2024.03.30, .ds-.logs-deprecation.elasticsearch-default-2024.04.03-000012, .monitoring-kibana-7-2024.04.01, .kibana-observability-ai-assistant-kb-000001, .async-search]
[2024-04-05T09:35:02,334][INFO ][o.e.c.s.MasterService    ] [zeek1] node-join[{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{aKjSX4mUSvCJezQmdQoAHA}{zeek2}{[zeek2 IP]}{[zeek2 IP]:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000} joining], term: 11, version: 9905, delta: added {{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{aKjSX4mUSvCJezQmdQoAHA}{zeek2}{[zeek2 IP]}{[zeek2 IP]:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000}}
[2024-04-05T09:35:07,970][INFO ][o.e.c.s.ClusterApplierService] [zeek1] added {{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{aKjSX4mUSvCJezQmdQoAHA}{zeek2}{[zeek2 IP]}{[zeek2 IP]:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000}}, term: 11, version: 9905, reason: Publication{term=11, version=9905}
[2024-04-05T09:35:08,060][INFO ][o.e.c.c.NodeJoinExecutor ] [zeek1] node-join: [{zeek2}{lXTEDQtlRei7ZRW8c8BpGg}{aKjSX4mUSvCJezQmdQoAHA}{zeek2}{[zeek2 IP]}{[zeek2 IP]:9300}{cdfhilmrstw}{8.13.1}{7000099-8503000}] with reason [joining]
[2024-04-05T09:43:33,286][WARN ][o.e.h.n.Netty4HttpServerTransport] [zeek1] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:57042}
[2024-04-05T09:43:35,143][WARN ][o.e.h.n.Netty4HttpServerTransport] [zeek1] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:57054}
[2024-04-05T09:44:39,472][WARN ][o.e.h.n.Netty4HttpServerTransport] [zeek1] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:51550}
[2024-04-05T09:45:05,768][WARN ][o.e.h.n.Netty4HttpServerTransport] [zeek1] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[zeek1 IP]:9200, remoteAddress=/[zeek2 IP]:57236}
[2024-04-05T09:46:51,364][WARN ][o.e.h.n.Netty4HttpServerTransport] [zeek1] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:42232}
[2024-04-05T10:08:03,465][INFO ][o.e.c.m.MetadataMappingService] [zeek1] [.ds-filebeat-8.10.3-2024.04.05-000093/LIurpTtqSEeOg5cqG0L_Pg] update_mapping [_doc]

In node 2, I see:

 ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:03,583][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:13,591][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:23,584][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:33,582][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:43,609][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:09:53,583][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)
[2024-04-05T10:10:03,582][INFO ][o.e.x.m.e.l.LocalExporter] [zeek2] waiting for elected master node [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}] to setup local exporter [default_local] (does it have x-pack installed?)

Here's some more relevant log entries from node 2:

[2024-04-05T09:35:00,909][INFO ][o.e.c.c.ClusterBootstrapService] [zeek2] this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to []
[2024-04-05T09:35:03,500][INFO ][o.e.c.s.ClusterApplierService] [zeek2] master node changed {previous [], current [{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}]}, added {{zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}}, term: 11, version: 9905, reason: ApplyCommitRequest{term=11, version=9905, sourceNode={zeek1}{rKbtrPsMTRGDhzu-e-XdYw}{clNgqfbEQ_6IRRHqY6Pk_g}{zeek1}{[zeek1 IP]}{[zeek1 IP]:9300}{cdfhilmrstw}{8.10.3}{7000099-8100399}{ml.allocated_processors=40, ml.allocated_processors_double=40.0, ml.max_jvm_size=33285996544, ml.config_version=10.0.0, transform.config_version=10.0.0, xpack.installed=true, ml.machine_memory=270348374016}}
[2024-04-05T09:35:03,537][INFO ][o.e.c.s.ClusterSettings  ] [zeek2] updating [xpack.monitoring.collection.enabled] from [false] to [true]

Looks to me like the node is joining the cluster.

Yep me too :+1:

Thanks very much. I learned a lot!

1 Like