Replacing Cluster Node

I use following which moved all the shard from that node.
then shutdown system.
setup new system with same name/ip

install all require rpms

and started node but it is not joining cluster. what am I missing?

PUT /_cluster/settings
{
     "transient" :{
      "cluster.routing.allocation.exclude._ip" : "ip_address"
     }
}

Then once ready I remove that exclusion
PUT /_cluster/settings
{
"transient" :{
"cluster.routing.allocation.exclude._ip" : ""
}
}

[2019-09-16T16:25:56,482][WARN ][o.e.c.c.ClusterFormationFailureHelper] [elkm02] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [elkm01, elkm03, elkm04] to bootstrap a cluster: have discovered [{elkm02}{JG6KV5tpQ3W2x4rfvn3jRQ}{6SlplW6UQOqfzG7yPLGcOw}{10.29.248.230}{10.29.248.230:9300}{dim}{ml.machine_memory=101022859264, xpack.installed=true, ml.max_open_jobs=20}, {elkm01}{OvJiX8RNQn2Vb05oCuBJZQ}{3LndIPP0S--CBPEfPwtB9w}{10.29.248.229}{10.29.248.229:9300}{dim}{ml.machine_memory=101090594816, ml.max_open_jobs=20, xpack.installed=true}, {elkm03}{oYXgEVBgQoGToxFxhVoA3g}{p4-GG4UwQYCeYZR_SRKpSw}{10.29.248.235}{10.29.248.235:9300}{dim}{ml.machine_memory=24996524032, ml.max_open_jobs=20, xpack.installed=true}, {elkm04}{XTFxPJ6yRaeZVhHAg5uIAA}{r1Uki0XtRXucIau_4RUF-A}{10.29.248.236}{10.29.248.236:9300}{dim}{ml.machine_memory=101054177280, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.29.248.229:9300, 10.29.248.235:9300, 10.29.248.236:9300] from hosts providers and [{elkm02}{JG6KV5tpQ3W2x4rfvn3jRQ}{6SlplW6UQOqfzG7yPLGcOw}{10.29.248.230}{10.29.248.230:9300}{dim}{ml.machine_memory=101022859264, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 16, last-accepted version 0 in term 0

[2019-09-16T16:25:56,472][INFO ][o.e.c.c.JoinHelper ] [elkm02] failed to join {elkm03}{oYXgEVBgQoGToxFxhVoA3g}{p4-GG4UwQYCeYZR_SRKpSw}{10.29.248.235}{10.29.248.235:9300}{dim}{ml.machine_memory=24996524032, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={elkm02}{JG6KV5tpQ3W2x4rfvn3jRQ}{6SlplW6UQOqfzG7yPLGcOw}{10.29.248.230}{10.29.248.230:9300}{dim}{ml.machine_memory=101022859264, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}

Here is elasticsearch.yml for node

cluster.name: my-cluster
cluster.initial_master_nodes: ["elkm01","elkm03","elkm04"]
node.name: elkm02
node.master: true
node.data: true
path.data: /elkdata01/elasticsearch
path.logs: /elkdata01/log/elasticsearch
network.host: 10.29.248.230
http.port: 9200
discovery.seed_hosts: ["elkm01","elkm02", "elkm03", "elkm04"]
discovery.find_peers_interval: 10s
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/config/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/config/elastic-certificates.p12

This message will have been followed by a stack trace. Can you share that stack trace?

Thank you @DavidTurner for looking in to this.

here is followup with stack trace

I can resolve the host
I can ping elkm02 and elkm03 from each other and ssh as well.

And then look further down found some kind of connection error.
Found that I had iptables on and some security. Fixed that and restarted.

IT IS Working. :slight_smile: THANK you again for fixing it.

[2019-09-16T16:43:35,418][INFO ][o.e.c.c.JoinHelper ] [elkm02] last failed join attempt was 79ms ago, failed to join {elkm03}{oYXgEVBgQoGToxFxhVoA3g}{p4-GG4UwQYCeYZR_SRKpSw}{10.29.248.235}{10.29.248.235:9300}{dim}{ml.machine_memory=24996524032, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={elkm02}{JG6KV5tpQ3W2x4rfvn3jRQ}{ET1b0mWbSnaTqe_JAz07nQ}{10.29.248.230}{10.29.248.230:9300}{dim}{ml.machine_memory=101022859264, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}

org.elasticsearch.transport.RemoteTransportException: [elkm03][10.29.248.235:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [elkm02][10.29.248.230:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:957) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.3.1.jar:7.3.1]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.3.1.jar:7.3.1]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:495) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.IOException: No route to host: 10.29.248.230/10.29.248.230:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:835) ~[?:?]
Caused by: java.io.IOException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
at java.lang.Thread.run(Thread.java:835) ~[?:?]

1 Like

I think you did the hard bit here, all I did was ask the right question :slightly_smiling_face: Glad it's working now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.