Unable to add second node to elastic search cluster

Hi
I have a two node cluster setup(ES version - 7.1) which was running absolutely fine until one of the node became unreachable. After bringing it up and starting elastic search, it does not add to the cluster.
PFB elasticsearch.yml config

    cluster.name: oep_np
    node.name: ausdlovpes01_1
    node.attr.rack: dev
    node.max_local_storage_nodes: 2
    node.master: true
    node.data: true
    path.data: /u01/es/data/es_01
    path.logs: /u01/es/logs/es_01
    bootstrap.memory_lock: true
    network.host: 10.179.192.121
    http.port: 8080
    transport.port: 8200
    transport.publish_port: 8081
    transport.profiles.default.port: 8081
    discovery.seed_hosts: ["10.179.192.121:8081", "10.179.200.12:8081"]
    cluster.initial_master_nodes: ["ausdlovpes01_1","ausilovpes01_1"]
    gateway.recover_after_nodes: 2
    cluster.routing.allocation.enable: none
    cluster.routing.allocation.same_shard.host: true
    xpack.security.enabled: false
    logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService: TRACE
    logger.org.elasticsearch.discovery: TRACE

From the log trace of master node it looks like the second node joins the cluster and within seconds leaves the cluster, I am not getting the reason. Please help to solve the issue.

PFB logs of master node(ausdlovpes01_1) :

[2020-03-19T07:45:48,084][INFO ][o.e.n.Node               ] [ausdlovpes01_1] starting ...
[2020-03-19T07:45:48,244][INFO ][o.e.t.TransportService   ] [ausdlovpes01_1] publish_address {10.179.192.121:8081}, bound_addresses {10.179.192.121:8081}
[2020-03-19T07:45:48,252][INFO ][o.e.b.BootstrapChecks    ] [ausdlovpes01_1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-03-19T07:45:48,259][DEBUG][o.e.d.SeedHostsResolver  ] [ausdlovpes01_1] using max_concurrent_resolvers [10], resolver timeout [5s]
[2020-03-19T07:45:48,260][INFO ][o.e.c.c.Coordinator      ] [ausdlovpes01_1] cluster UUID [zOSPKsdfSamulWnZ0syk5Q]
[2020-03-19T07:45:48,264][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] activating with nodes:
   {ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20}, local

[2020-03-19T07:45:48,266][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] probing master nodes from cluster state: nodes:
   {ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20}, local

[2020-03-19T07:45:48,266][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] startProbe(10.179.192.121:8081) not probing local node
[2020-03-19T07:45:48,287][TRACE][o.e.d.SeedHostsResolver  ] [ausdlovpes01_1] resolved host [10.179.192.121:8081] to [10.179.192.121:8081]
[2020-03-19T07:45:48,288][TRACE][o.e.d.SeedHostsResolver  ] [ausdlovpes01_1] resolved host [10.179.200.12:8081] to [10.179.200.12:8081]
[2020-03-19T07:45:48,290][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] probing resolved transport addresses [10.179.200.12:8081]
[2020-03-19T07:45:48,291][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] Peer{transportAddress=10.179.200.12:8081, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2020-03-19T07:45:48,295][TRACE][o.e.d.HandshakingTransportAddressConnector] [ausdlovpes01_1] [connectToRemoteMasterNode[10.179.200.12:8081]] opening probe connection
[2020-03-19T07:45:48,331][DEBUG][o.e.d.PeerFinder         ] [ausdlovpes01_1] Peer{transportAddress=10.179.200.12:8081, discoveryNode=null, peersRequestInFlight=false} connection failed
org.elasticsearch.transport.ConnectTransportException: [][10.179.200.12:8081] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.1.0.jar:7.1.0]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.1.0.jar:7.1.0]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.1.0.jar:7.1.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.1.0.jar:7.1.0]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /10.179.200.12:8081
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 6 more
[2020-03-19T07:45:48,387][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] deactivating and setting leader to {ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20}
[2020-03-19T07:45:48,388][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] not active
[2020-03-19T07:45:48,412][INFO ][o.e.c.r.a.AllocationService] [ausdlovpes01_1] updating number_of_replicas to [0] for indices [.kibana_task_manager, .kibana_2, .kibana_1, .tasks]
[2020-03-19T07:45:48,424][INFO ][o.e.c.s.MasterService    ] [ausdlovpes01_1] elected-as-master ([1] nodes joined)[{ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 18, version: 112085, reason: master node changed {previous [], current [{ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-03-19T07:45:48,542][INFO ][o.e.c.s.ClusterApplierService] [ausdlovpes01_1] master node changed {previous [], current [{ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, xpack.installed=true, ml.max_open_jobs=20}]}, term: 18, version: 112085, reason: Publication{term=18, version=112085}
[2020-03-19T07:45:48,591][INFO ][o.e.h.AbstractHttpServerTransport] [ausdlovpes01_1] publish_address {10.179.192.121:8080}, bound_addresses {10.179.192.121:8080}
[2020-03-19T07:45:48,591][INFO ][o.e.n.Node               ] [ausdlovpes01_1] started
[2020-03-19T07:45:49,271][TRACE][o.e.d.PeerFinder         ] [ausdlovpes01_1] not active
[2020-03-19T07:46:15,860][INFO ][o.e.c.r.a.AllocationService] [ausdlovpes01_1] updating number_of_replicas to [1] for indices [.kibana_task_manager, .kibana_2, .kibana_1, .tasks]
[2020-03-19T07:46:15,862][INFO ][o.e.c.s.MasterService    ] [ausdlovpes01_1] node-join[{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 18, version: 112086, reason: added {{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true},}
[2020-03-19T07:46:16,029][INFO ][o.e.c.s.ClusterApplierService] [ausdlovpes01_1] added {{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true},}, term: 18, version: 112086, reason: Publication{term=18, version=112086}
[2020-03-19T07:46:16,114][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ausdlovpes01_1] low disk watermark [85%] exceeded on [2jifvTz5SeuUc1MljZua2g][ausdlovpes01_1][/u01/es/data/es_01/nodes/0] free: 4.4gb[11.3%], replicas will not be assigned to this node
[2020-03-19T07:46:16,115][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ausdlovpes01_1] low disk watermark [85%] exceeded on [J8s6PJ27SCa5ymJsA41Vzg][ausilovpes01_1][/u01/es/data/es_01/nodes/0] free: 5.8gb[14.9%], replicas will not be assigned to this node
[2020-03-19T07:46:16,402][INFO ][o.e.l.LicenseService     ] [ausdlovpes01_1] license [138c5a33-124d-4cd9-8dfd-c1e41e814366] mode [basic] - valid
[2020-03-19T07:46:16,415][INFO ][o.e.g.GatewayService     ] [ausdlovpes01_1] recovered [5] indices into cluster_state
[2020-03-19T07:46:18,030][INFO ][o.e.c.r.a.AllocationService] [ausdlovpes01_1] updating number_of_replicas to [0] for indices [.kibana_task_manager, .kibana_2, .kibana_1, .tasks]
[2020-03-19T07:46:18,031][INFO ][o.e.c.s.MasterService    ] [ausdlovpes01_1] node-left[{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true} followers check retry count exceeded], term: 18, version: 112091, reason: removed {{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true},}
[2020-03-19T07:46:18,103][INFO ][o.e.c.s.ClusterApplierService] [ausdlovpes01_1] removed {{ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, ml.max_open_jobs=20, xpack.installed=true},}, term: 18, version: 112091, reason: Publication{term=18, version=112091}
[2020-03-19T07:46:18,187][INFO ][o.e.c.r.a.AllocationService] [ausdlovpes01_1] updating number_of_replicas to [1] for indices [.kibana_task_manager, .kibana_2, .kibana_1, .tasks]

(and this node-join node-left continues...)

The server space of node ausdlovpes01_1 is used at 89% leaving 4.5 GB (not sure if this has could have caused any exception)

adding logs of second node(ausilovpes01_1)

[2020-03-19T07:46:15,031][INFO ][o.e.n.Node               ] [ausilovpes01_1] starting ...
[2020-03-19T07:46:15,162][INFO ][o.e.t.TransportService   ] [ausilovpes01_1] publish_address {10.179.200.12:8081}, bound_addresses {10.179.200.12:8081}
[2020-03-19T07:46:15,171][INFO ][o.e.b.BootstrapChecks    ] [ausilovpes01_1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-03-19T07:46:15,179][DEBUG][o.e.d.SeedHostsResolver  ] [ausilovpes01_1] using max_concurrent_resolvers [10], resolver timeout [5s]
[2020-03-19T07:46:15,180][INFO ][o.e.c.c.Coordinator      ] [ausilovpes01_1] cluster UUID [zOSPKsdfSamulWnZ0syk5Q]
[2020-03-19T07:46:15,184][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] activating with nodes:
   {ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, xpack.installed=true, ml.max_open_jobs=20}, local

[2020-03-19T07:46:15,185][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] probing master nodes from cluster state: nodes:
   {ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, xpack.installed=true, ml.max_open_jobs=20}, local

[2020-03-19T07:46:15,186][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] startProbe(10.179.200.12:8081) not probing local node
[2020-03-19T07:46:15,196][TRACE][o.e.d.SeedHostsResolver  ] [ausilovpes01_1] resolved host [10.179.192.121:8081] to [10.179.192.121:8081]
[2020-03-19T07:46:15,196][TRACE][o.e.d.SeedHostsResolver  ] [ausilovpes01_1] resolved host [10.179.200.12:8081] to [10.179.200.12:8081]
[2020-03-19T07:46:15,197][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] probing resolved transport addresses [10.179.192.121:8081]
[2020-03-19T07:46:15,198][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] Peer{transportAddress=10.179.192.121:8081, discoveryNode=null, peersRequestInFlight=false} attempting connection
[2020-03-19T07:46:15,202][TRACE][o.e.d.HandshakingTransportAddressConnector] [ausilovpes01_1] [connectToRemoteMasterNode[10.179.192.121:8081]] opening probe connection
[2020-03-19T07:46:15,320][TRACE][o.e.d.HandshakingTransportAddressConnector] [ausilovpes01_1] [connectToRemoteMasterNode[10.179.192.121:8081]] opened probe connection
[2020-03-19T07:46:15,381][TRACE][o.e.d.HandshakingTransportAddressConnector] [ausilovpes01_1] [connectToRemoteMasterNode[10.179.192.121:8081]] handshake successful: {ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}
[2020-03-19T07:46:15,544][TRACE][o.e.d.HandshakingTransportAddressConnector] [ausilovpes01_1] [connectToRemoteMasterNode[10.179.192.121:8081]] full connection successful: {ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}
[2020-03-19T07:46:15,545][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] Peer{transportAddress=10.179.192.121:8081, discoveryNode={ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2020-03-19T07:46:15,613][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] Peer{transportAddress=10.179.192.121:8081, discoveryNode={ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=true} received PeersResponse{masterNode=Optional[{ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}], knownPeers=[], term=18}
[2020-03-19T07:46:16,188][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] Peer{transportAddress=10.179.192.121:8081, discoveryNode={ausdlovpes01_1}{2jifvTz5SeuUc1MljZua2g}{yYpuu9D7Q0q3QHprUnpPVQ}{10.179.192.121}{10.179.192.121:8081}{ml.machine_memory=8182054912, rack=dev, ml.max_open_jobs=20, xpack.installed=true}, peersRequestInFlight=false} requesting peers
[2020-03-19T07:46:16,189][TRACE][o.e.d.PeerFinder         ] [ausilovpes01_1] probing master nodes from cluster state: nodes:
   {ausilovpes01_1}{J8s6PJ27SCa5ymJsA41Vzg}{gkjxsliJRKiG6RDswo380A}{10.179.200.12}{10.179.200.12:8081}{ml.machine_memory=8182046720, rack=dev_replica, xpack.installed=true, ml.max_open_jobs=20}, local

(and this continues from start probe...)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.