Data nodes failed to send join request to master

ELK_GB · April 18, 2018, 6:07pm

Hi,

I have an Elastic cluster and been running with no problems for the past year or so. We are using version 5.4 and running in an Openstack environment.

We had an outage and looks like some nodes didn't shut-down properly.
After I started all ES instances starting with the master nodes then data and finally client, my data nodes are failing to join.

On the data nodes:

[2018-04-18T17:57:32,857][WARN ][o.e.n.Node ] [esdata-11] timed out while waiting for initial discovery state - timeout: 30s

.
.
.

[2018-04-18T17:25:10,267][INFO ][o.e.d.z.ZenDiscovery ] [esdata-11] failed to send join request to master [{master-1}{-0ZXpK0VRV6yf8QdTwewPg}{xayChL4oQB6etcAwguIfJQ}{192.12.0.8}{192.12.0.8:9300}{ml.enabled=true}], reason [RemoteTransportException[[master-1][192.12.0.8:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[esdata-11][192.12.0.21:9300] connect_timeout[30s]]; nested: IOException[connection timed out: 192.12.0.21/192.12.0.21:9300]; ]

On the Master node:

[2018-04-18T10:51:16,477][WARN ][o.e.x.m.e.l.LocalExporter] unexpected error while indexing monitoring document org.elasticsearch.xpack.monitoring.exporter.ExportException: UnavailableShardsException[[.monitoring-es-2-2018.04.18][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-2-2018.04.18][0]] containing [5] requests]]

Also,

2018-04-18T13:24:49,315][WARN ][o.e.d.z.PublishClusterStateAction] [master-1] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{master-2}{0aacWZoiTNaNFxSIn0sETg}{L-jh0PFDRzaABYjcQLXmwA}{192.12.0.7}{192.12.0.7:9300}{ml.enabled=true}])

[2018-04-18T14:05:04,037][WARN ][o.e.x.m.MonitoringService] [master-2] monitoring execution failed
org.elasticsearch.xpack.monitoring.exporter.ExportException: Exception when closing export bulk

I have looked at many resources as possible and each one had a different solution which in most cases didn't apply to me.

The weird thing is that the data nodes can see the masters changing, but still they fail to join!

Any idea whats wrong here is it Openstack network related ? or more ES related ?

ELK_GB · April 19, 2018, 8:31pm

I was able to resolve my problem successfully and get my cluster to start recovering, by turning off iptables, or in general iptables need to be modified for the service to properly use the ports needed i.e. 9300 and 9200.

system · May 17, 2018, 8:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data nodes are not able to join master, failed to send join request to master Elasticsearch	2	879	February 25, 2019
Failed to send join request to master, discovery timed out Elasticsearch	2	4034	December 22, 2017
Problem With Elasticsearch Cluster : failed to send join request to master Elasticsearch	3	1935	July 5, 2017
Failed to send join request to master? Elasticsearch	1	1134	April 4, 2017
Es node failed to send join request to each other，how to solve it？ Elasticsearch	4	405	June 6, 2018

Data nodes failed to send join request to master

Related topics