Master node disconnected from a cluster and not connecting back

my elastic cluster was working fine.
Then suddenly 1 master node disconnected from cluster and it couldn't connect back.
other nodes form a healthy cluster

i check the network and firewall and i didn't find anything wrong

[2018-12-31T08:57:24,048][INFO ][o.e.n.Node               ] [STATEOFUT] initializing ...
[2018-12-31T08:57:24,298][INFO ][o.e.e.NodeEnvironment    ] [STATEOFUT] using [1] data paths, mounts [[Data Disk (E:)]], net usable_space [1006.5gb], net total_space [1022.9gb], types [NTFS]
[2018-12-31T08:57:24,298][INFO ][o.e.e.NodeEnvironment    ] [STATEOFUT] heap size [3.9gb], compressed ordinary object pointers [true]
[2018-12-31T08:57:24,298][INFO ][o.e.n.Node               ] [STATEOFUT] node name [STATEOFUT], node ID [BV5mJhwhTXW9kDV3Vkn-bA]
[2018-12-31T08:57:24,298][INFO ][o.e.n.Node               ] [STATEOFUT] version[6.4.3], pid[14412], build[oss/zip/fe40335/2018-10-30T23:17:19.084789Z], OS[Windows Server 2012 R2/6.3/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_192/25.192-b12]
[2018-12-31T08:57:24,298][INFO ][o.e.n.Node               ] [STATEOFUT] JVM arguments [-Xms4g, -Xmx4g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=C:\Users\STATEO~1\AppData\Local\Temp\2\elasticsearch, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Delasticsearch, -Des.path.home=E:\DataFolder-StateOfMN\ELK\ElasticSearch\elasticsearch-6.4.3, -Des.path.conf=E:\DataFolder-StateOfMN\ELK\ElasticSearch\ElasticData\config, -Des.distribution.flavor=oss, -Des.distribution.type=zip, exit, -Xms4096m, -Xmx4096m, -Xss1024k]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [aggs-matrix-stats]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [analysis-common]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [ingest-common]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [lang-expression]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [lang-mustache]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [lang-painless]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [mapper-extras]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [parent-join]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [percolator]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [rank-eval]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [reindex]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [repository-url]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [transport-netty4]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded module [tribe]
[2018-12-31T08:57:27,228][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded plugin [ingest-geoip]
[2018-12-31T08:57:27,244][INFO ][o.e.p.PluginsService     ] [STATEOFUT] loaded plugin [ingest-user-agent]
[2018-12-31T08:57:41,068][INFO ][o.e.d.DiscoveryModule    ] [STATEOFUT] using discovery type [zen]
[2018-12-31T08:57:42,740][INFO ][o.e.n.Node               ] [STATEOFUT] initialized
[2018-12-31T08:57:42,740][INFO ][o.e.n.Node               ] [STATEOFUT] starting ...
[2018-12-31T08:57:43,099][INFO ][o.e.t.TransportService   ] [STATEOFUT] publish_address {10.0.0.7:9093}, bound_addresses {10.0.0.7:9093}
[2018-12-31T08:57:43,115][INFO ][o.e.b.BootstrapChecks    ] [STATEOFUT] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-12-31T08:58:13,198][WARN ][o.e.n.Node               ] [STATEOFUT] timed out while waiting for initial discovery state - timeout: 30s
[2018-12-31T08:58:13,198][INFO ][o.e.h.n.Netty4HttpServerTransport] [STATEOFUT] publish_address {10.0.0.7:5044}, bound_addresses {10.0.0.7:5044}
[2018-12-31T08:58:13,198][INFO ][o.e.n.Node               ] [STATEOFUT] started
[2018-12-31T08:58:16,496][INFO ][o.e.d.z.ZenDiscovery     ] [STATEOFUT] failed to send join request to master [{preshome-linux-2}{tgAK-cGrQ6WVqab5vKJ0ng}{hUfGPynvTUeLbD0lby_NBQ}{10.0.0.9}{10.0.0.9:9093}], reason [RemoteTransportException[[preshome-linux-2][10.0.0.9:9093][internal:discovery/zen/join]]; nested: ConnectTransportException[[STATEOFUT][10.0.0.7:9093] connect_timeout[30s]]; ]
[2018-12-31T08:58:49,535][INFO ][o.e.d.z.ZenDiscovery     ] [STATEOFUT] failed to send join request to master [{preshome-linux-2}{tgAK-cGrQ6WVqab5vKJ0ng}{hUfGPynvTUeLbD0lby_NBQ}{10.0.0.9}{10.0.0.9:9093}], reason [RemoteTransportException[[preshome-linux-2][10.0.0.9:9093][internal:discovery/zen/join]]; nested: ConnectTransportException[[STATEOFUT][10.0.0.7:9093] connect_timeout[30s]]; ]
[2018-12-31T08:59:22,549][INFO ][o.e.d.z.ZenDiscovery     ] [STATEOFUT] failed to send join request to master [{preshome-linux-2}{tgAK-cGrQ6WVqab5vKJ0ng}{hUfGPynvTUeLbD0lby_NBQ}{10.0.0.9}{10.0.0.9:9093}], reason [RemoteTransportException[[preshome-linux-2][10.0.0.9:9093][internal:discovery/zen/join]]; nested: ConnectTransportException[[STATEOFUT][10.0.0.7:9093] connect_timeout[30s]]; ]

how can i manually check transport port connection working or not ?

What are those ports you are using, 5044, 9093...

yes i am using 5044 for http.port and 9093 for transport

How many nodes do you have in the cluster? How are these configured?

my setup has 4 nodes,
data and master node => 2
data node => 1
master node => 1

now master only node not connecting to cluster

Do you have discovery.zen.minimum_master_nodes set to 2 as per these guidelines? Can you telnet to port 9093 on the other hosts from the one that does not connect to the rest of the cluster? Do all nodes have all master-eligible nodes listed in their config?

yes i set to 2. and i specified all the master nodes in discovery.zen.ping.unicast.hosts elasticsearch.yml

Can you show the full configuration?

Can you telnet to port 9093 on the other hosts from the one that does not connect to the rest of the cluster?

i checked it said connection fine. i didn't understand the problem

i used pstool (ps ping)

Ping does as far as I know not check that you can connect to the port. Try connecting using telnet as I mentioned earlier.

i checked
telnet [ip] [port] for all ip and port combination

it give a blank screen when i click return

I have not used telnet on Windows so am not sure what that means. Do you get the same thing if you telnet to the local node on the same port, e.g. telnet 10.0.0.7 9093?

it also give same black blank screen

master only node is windows 2012 server and all other nodes are ubuntu server. Is there any problem with that ?

i saw this error in ubuntu master node

i found out the issue, even though port pinging is doing fine, firewall in linux system was blocking connection. i added a new rule and now all working fine

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.