Elastic 5.1.1 failed to form a cluster for frequently pinged time out


(Tan) #1

total 8 nodes ,three master eligible nodes. node ping each other frequently timed out as the following log shown.
it's definitely that the network is fine. I tried elastic 1.x, it can form a cluster very well.

Any advice will help, thanks.

------------------------------------------------Environment & Log--------------------------------------------------------------------------------------------

Plugins installed: [] only default

JVM version: Java(TM) SE Runtime Environment (build 1.8.0_111-b14)

OS version: 2.6.32-504.12.2.02.el6.x86_64 #1 SMP Tue May 12 11:44:09 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

Provide logs (if relevant):
[2016-12-15T19:39:17,648][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:39:42,556][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:02,565][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]
[2016-12-15T19:40:25,433][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: NodeDisconnectedException[[node244][ee.ff.gg.hh:9371][internal:discovery/zen/join/validate] disconnected]; ]
[2016-12-15T19:40:42,442][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [NodeDisconnectedException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join] disconnected]]
[2016-12-15T19:41:03,531][INFO ][o.e.c.s.ClusterService ] [node244] detected_master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}, added {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.x.xx.x}{x.x.x.x:9371},{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [115]])
[2016-12-15T19:41:09,541][INFO ][o.e.d.z.ZenDiscovery ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2016-12-15T19:41:09,541][WARN ][o.e.d.z.ZenDiscovery ] [node244] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}

[2016-12-15T19:41:09,542][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:28,555][INFO ][o.e.c.s.ClusterService ] [node244] master {new {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}}, removed {{node241}{gx3rLPdIQvSm_u6mLoIbtQ}{80MXRZOjSN6RAQ7gZGeUcQ}{xx.xx.xx.xx}{x.x.x.x:9371},}, added {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: zen-disco-receive(from master [master {node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371} committed version [120]])
[2016-12-15T19:41:34,564][INFO ][o.e.d.z.ZenDiscovery ] [node244] master_left [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2016-12-15T19:41:34,564][WARN ][o.e.d.z.ZenDiscovery ] [node244] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{node243}{tEv4PrXSTdyecz2nClog4Q}{KQsJYnauSe2W0hxWPte4Iw}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}
{node239}{oWLoFJf-Rs-3q-kK1Y_2BQ}{dekHpx5zQASxKgIO21n1Yg}{xx.xx.xx.xx}{xx.xx.xx.xx:9371}

[2016-12-15T19:41:34,565][INFO ][o.e.c.s.ClusterService ] [node244] removed {{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371},}, reason: master_failed ({node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371})
[2016-12-15T19:41:55,575][INFO ][o.e.d.z.ZenDiscovery ] [node244] failed to send join request to master [{node237}{FmyPJDzMR2KRSl10jPGT0Q}{hrOcPGYBTOSEQU5J7hOwVA}{aa.bb.cc.dd}{aa.bb.cc.dd:9371}], reason [RemoteTransportException[[node237][aa.bb.cc.dd:9371][internal:discovery/zen/join]]; nested: ConnectTransportException[[node244][ee.ff.gg.hh:9371] connect_timeout[30s]]; nested: IOException[Connection timed out: ee.ff.gg.hh/ee.ff.gg.hh:9371]; ]


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.