New nodes failing to join cluster

I have two nodes that have been out of the cluster for some time and I have just tried to bring them back in. Both fail in the same manner: connection timed out.

Tcpdump running on one of the cluster members shows the two machines communicating.

[2021-08-10T15:53:55,249][INFO ][o.e.c.c.JoinHelper       ] [secesprd05] failed to join {secesprd01}{kAWPcpoxSNSN9WlUsYlQlg}{IZs_lY1dStmeuqmhsgWQOQ}{10.6.0.67}{10.6.0.67:9300}{cdhmw}{xpack.installed=true, molochtype=hot, transform.node=false} with JoinRequest{sourceNode={secesprd05}{4cPiEfloRoKgvx-NqVp4aA}{lhmrpJpNQhuS_0hhrGDR5g}{130.216.236.212}{130.216.236.212:9300}{cd}{xpack.installed=true, transform.node=false}, minimumTerm=32, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [secesprd01][10.6.0.67:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [secesprd05][130.216.236.212:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:978) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:198) ~[elasticsearch-7.10.1.jar:7.10.1]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.10.1.jar:7.10.1]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.10.1.jar:7.10.1]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[transport-netty4-client-7.10.1.jar:7.10.1]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.IOException: connection timed out: 130.216.236.212/130.216.236.212:9300
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

tcpdump shows:

15:53:46.258143 IP secesprd05.its.auckland.ac.nz.49094 > secesprd01.its.auckland.ac.nz.9300: Flags [P.], seq 567169938:567170919, ack 1195283306, win 501, options [nop,nop,TS val 2264656474 ecr 2515565298], length 981
        0x0000:  4500 0409 b743 4000 3f06 06b6 82d8 ecd4  E....C@.?.......
        0x0010:  0a06 0043 bfc6 2454 21ce 5392 473e 936a  ...C..$T!.S.G>.j
        0x0020:  8018 01f5 4bf7 0000 0101 080a 86fb ea5a  ....K..........Z
        0x0030:  95f0 7af2 4553 0000 03cf 0000 0000 0000  ..z.ES..........
        0x0040:  10ad 0000 6c56 c300 0000 8b01 1e5f 7870  ....lV......._xp
        0x0050:  6163 6b5f 7365 6375 7269 7479 5f61 7574  ack_security_aut
        0x0060:  6865 6e74 6963 6174 696f 6e40 7736 3278  hentication@w62x
        0x0070:  4177 4548 5833 4e35 6333 526c 6251 707a  AwEHX3N5c3RlbQpz
        0x0080:  5a57 4e6c 6333 4279 5a44 4131 4346 3966  ZWNlc3ByZDA1CF9f
        0x0090:  5958 5230 5957 4e6f 4346 3966 5958 5230  YXR0YWNoCF9fYXR0
        0x00a0:  5957 4e6f 4141 514b 4141 3d3d 0001 0678  YWNoAAQKAA==...x
        0x00b0:  2d70 6163 6b20 696e 7465 726e 616c 3a64  -pack.internal:d
        0x00c0:  6973 636f 7665 7279 2f72 6571 7565 7374  iscovery/request
        0x00d0:  5f70 6565 7273 000a 7365 6365 7370 7264  _peers..secesprd
        0x00e0:  3035 1634 6350 6945 666c 6f52 6f4b 6776  05.4cPiEfloRoKgv
        0x00f0:  782d 4e71 5670 3461 4116 6c68 6d72 704a  x-NqVp4aA.lhmrpJ
        0x0100:  704e 5168 7553 5f30 6868 7247 4452 3567  pNQhuS_0hhrGDR5g
        0x0110:  0f31 3330 2e32 3136 2e32 3336 2e32 3132  .130.216.236.212
        0x0120:  0f31 3330 2e32 3136 2e32 3336 2e32 3132  .130.216.236.212
        0x0130:  0482 d8ec d40f 3133 302e 3231 362e 3233  ......130.216.23
        0x0140:  362e 3231 3200 0024 5402 0f78 7061 636b  6.212..$T..xpack
        0x0150:  2e69 6e73 7461 6c6c 6564 0474 7275 650e  .installed.true.
        0x0160:  7472 616e 7366 6f72 6d2e 6e6f 6465 0566  transform.node.f
        0x0170:  616c 7365 0204 6461 7461 0164 0109 6461  alse..data.d..da
        0x0180:  7461 5f63 6f6c 6401 6301 a7ae b103 030a  ta_cold.c.......
        0x0190:  7365 6365 7370 7264 3031 166b 4157 5063  secesprd01.kAWPc
        0x01a0:  706f 7853 4e53 4e39 576c 5573 596c 516c  poxSNSN9WlUsYlQl
        0x01b0:  6716 495a 735f 6c59 3164 5374 6d65 7571  g.IZs_lY1dStmeuq
        0x01c0:  6d68 7367 5751 4f51 0931 302e 362e 302e  mhsgWQOQ.10.6.0.
        0x01d0:  3637 0931 302e 362e 302e 3637 040a 0600  67.10.6.0.67....
        0x01e0:  4309 3130 2e36 2e30 2e36 3700 0024 5403  C.10.6.0.67..$T.
        0x01f0:  0f78 7061 636b 2e69 6e73 7461 6c6c 6564  .xpack.installed
        0x0200:  0474 7275 650a 6d6f 6c6f 6368 7479 7065  .true.molochtype
        0x0210:  0368 6f74 0e74 7261 6e73 666f 726d 2e6e  .hot.transform.n
        0x0220:  6f64 6505 6661 6c73 6505 0464 6174 6101  ode.false..data.
        0x0230:  6401 0964 6174 615f 636f 6c64 0163 0108  d..data_cold.c..
        0x0240:  6461 7461 5f68 6f74 0168 0109 6461 7461  data_hot.h..data
        0x0250:  5f77 6172 6d01 7701 066d 6173 7465 7201  _warm.w..master.
        0x0260:  6d00 c3ad b103 0b73 6563 6d6f 6e70 7264  m......secmonprd
        0x0270:  3037 1654 4e48 6c64 4779 4151 3532 734e  07.TNHldGyAQ52sN
        0x0280:  6c49 6247 5062 674d 6716 724c 6359 4172  lIbGPbgMg.rLcYAr
        0x0290:  4637 5359 654b 414a 4c31 6a42 6a43 3667  F7SYeKAJL1jBjC6g
        0x02a0:  0d31 3330 2e32 3136 2e35 2e31 3131 0d31  .130.216.5.111.1
        0x02b0:  3330 2e32 3136 2e35 2e31 3131 0482 d805  30.216.5.111....
        0x02c0:  6f0d 3133 302e 3231 362e 352e 3131 3100  o.130.216.5.111.
        0x02d0:  0024 5403 0f78 7061 636b 2e69 6e73 7461  .$T..xpack.insta
        0x02e0:  6c6c 6564 0474 7275 650a 6d6f 6c6f 6368  lled.true.moloch
        0x02f0:  7479 7065 0477 6172 6d0e 7472 616e 7366  type.warm.transf
        0x0300:  6f72 6d2e 6e6f 6465 0566 616c 7365 0304  orm.node.false..
        0x0310:  6461 7461 0164 0109 6461 7461 5f77 6172  data.d..data_war
        0x0320:  6d01 7701 066d 6173 7465 7201 6d00 c3ad  m.w..master.m...
        0x0330:  b103 0a73 6563 6573 7072 6430 3216 3655  ...secesprd02.6U
        0x0340:  4461 674a 5732 5433 6557 4d2d 3050 514a  DagJW2T3eWM-0PQJ
        0x0350:  3072 4d41 1677 6230 6a6b 7a44 6b51 364b  0rMA.wb0jkzDkQ6K
        0x0360:  3536 4f4e 5777 6b58 336b 4109 3130 2e36  56ONWwkX3kA.10.6
        0x0370:  2e30 2e36 3809 3130 2e36 2e30 2e36 3804  .0.68.10.6.0.68.
        0x0380:  0a06 0044 0931 302e 362e 302e 3638 0000  ...D.10.6.0.68..
        0x0390:  2454 030f 7870 6163 6b2e 696e 7374 616c  $T..xpack.instal
        0x03a0:  6c65 6404 7472 7565 0a6d 6f6c 6f63 6874  led.true.molocht
        0x03b0:  7970 6503 686f 740e 7472 616e 7366 6f72  ype.hot.transfor
        0x03c0:  6d2e 6e6f 6465 0566 616c 7365 0504 6461  m.node.false..da
        0x03d0:  7461 0164 0109 6461 7461 5f63 6f6c 6401  ta.d..data_cold.
        0x03e0:  6301 0864 6174 615f 686f 7401 6801 0964  c..data_hot.h..d
        0x03f0:  6174 615f 7761 726d 0177 0106 6d61 7374  ata_warm.w..mast
        0x0400:  6572 016d 00c3 adb1 03                   er.m.....
15:53:46.258352 IP secesprd01.its.auckland.ac.nz.9300 > secesprd05.its.auckland.ac.nz.49094: Flags [P.], seq 1:347, ack 981, win 501, options [nop,nop,TS val 2515567299 ecr 2264656474], length 346
        0x0000:  4500 018e b83e 4000 4006 0736 0a06 0043  E....>@.@..6...C
        0x0010:  82d8 ecd4 2454 bfc6 473e 936a 21ce 5767  ....$T..G>.j!.Wg
        0x0020:  8018 01f5 7b76 0000 0101 080a 95f0 82c3  ....{v..........
        0x0030:  86fb ea5a 4553 0000 0154 0000 0000 0000  ...ZES...T......
        0x0040:  10ad 0100 6c56 c300 0000 6201 1e5f 7870  ....lV....b.._xp
        0x0050:  6163 6b5f 7365 6375 7269 7479 5f61 7574  ack_security_aut
        0x0060:  6865 6e74 6963 6174 696f 6e40 7736 3278  hentication@w62x
        0x0070:  4177 4548 5833 4e35 6333 526c 6251 707a  AwEHX3N5c3RlbQpz
        0x0080:  5a57 4e6c 6333 4279 5a44 4131 4346 3966  ZWNlc3ByZDA1CF9f
        0x0090:  5958 5230 5957 4e6f 4346 3966 5958 5230  YXR0YWNoCF9fYXR0
        0x00a0:  5957 4e6f 4141 514b 4141 3d3d 0001 0a73  YWNoAAQKAA==...s
        0x00b0:  6563 6573 7072 6430 3116 6b41 5750 6370  ecesprd01.kAWPcp
        0x00c0:  6f78 534e 534e 3957 6c55 7359 6c51 6c67  oxSNSN9WlUsYlQlg
        0x00d0:  1649 5a73 5f6c 5931 6453 746d 6575 716d  .IZs_lY1dStmeuqm
        0x00e0:  6873 6757 514f 5109 3130 2e36 2e30 2e36  hsgWQOQ.10.6.0.6
        0x00f0:  3709 3130 2e36 2e30 2e36 3704 0a06 0043  7.10.6.0.67....C
        0x0100:  0931 302e 362e 302e 3637 0000 2454 030f  .10.6.0.67..$T..
        0x0110:  7870 6163 6b2e 696e 7374 616c 6c65 6404  xpack.installed.
        0x0120:  7472 7565 0a6d 6f6c 6f63 6874 7970 6503  true.molochtype.
        0x0130:  686f 740e 7472 616e 7366 6f72 6d2e 6e6f  hot.transform.no
        0x0140:  6465 0566 616c 7365 0504 6461 7461 0164  de.false..data.d
        0x0150:  0109 6461 7461 5f63 6f6c 6401 6301 0864  ..data_cold.c..d
        0x0160:  6174 615f 686f 7401 6801 0964 6174 615f  ata_hot.h..data_
        0x0170:  7761 726d 0177 0106 6d61 7374 6572 016d  warm.w..master.m
        0x0180:  00c3 adb1 0300 0000 0000 0000 0020       ..............
15:53:46.258485 IP secesprd05.its.auckland.ac.nz.49094 > secesprd01.its.auckland.ac.nz.9300: Flags [.], ack 347, win 501, options [nop,nop,TS val 2264656474 ecr 2515567299], length 0
        0x0000:  4500 0034 b744 4000 3f06 0a8a 82d8 ecd4  E..4.D@.?.......
        0x0010:  0a06 0043 bfc6 2454 21ce 5767 473e 94c4  ...C..$T!.WgG>..
        0x0020:  8010 01f5 3775 0000 0101 080a 86fb ea5a  ....7u.........Z
0x0030:  95f0 82c3                                ....

any ideas as to what is going on?

The packet capture shows that secesprd05 has connected to secesprd01:9300 but the exception indicates that there's a problem in the other direction: secesprd01 cannot connect to secesprd05:9300.

Thanks David, it is always the simple ones that get me. The firewall rules on the ones that I was adding back lacked entries for two master enabled members of the cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.