Recreation of ES data node failed

Good afternoon.

There was an elasticsearch cluster consisting of three data nodes and one coordinator with Kibana. It happened that one server died and was re-created. The ES version is the same, Java is the same, and the configuration in elasticsearch. yml is the same.

The service started, there are no errors, but this node did not appear in cluster monitoring. curl localhost:9200/_cat/nodes/ doesnt show problem node.

Please tell me what the problem may be?

Can you share log file from the started node ?

Cant put full log from /var/log/elasticsearch/cluster.log, its too big.

Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
	at org.elasticsearch.cluster.coordination.Coordinator$3.onFailure(Coordinator.java:500) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.coordination.JoinHelper$5.handleException(JoinHelper.java:359) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1124) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TcpTransport.lambda$handleException$24(TcpTransport.java:1001) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
Caused by: org.elasticsearch.transport.RemoteTransportException: [elk-es-node-02][10.199.5.104:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid VdZPYWRCT8eMvjFOpWB6Lw than local cluster uuid 693EJRStTo-3GwqanfvZkA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1077) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_211]
[2020-09-06T14:51:13,609][INFO ][o.e.c.c.JoinHelper       ] [elk-es-node-02] failed to join {elk-es-node-03}{sPaS62bdS8CJrY9MWNUkgQ}{eToptbY8TFahH6E1iv_RLQ}{10.199.5.105}{10.199.5.105:9300}{ml.machine_memory=12592672768, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={elk-es-node-02}{u2qN94UbTt6hHKl_ShftVw}{wsLKW5uGTUSnM6_YjvmxIQ}{10.199.5.104}{10.199.5.104:9300}{ml.machine_memory=12564250624, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [elk-es-node-03][10.199.5.105:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
	at org.elasticsearch.cluster.coordination.Coordinator$3.onFailure(Coordinator.java:500) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.coordination.JoinHelper$5.handleException(JoinHelper.java:359) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1124) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TcpTransport.lambda$handleException$24(TcpTransport.java:1001) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
Caused by: org.elasticsearch.transport.RemoteTransportException: [elk-es-node-02][10.199.5.104:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid VdZPYWRCT8eMvjFOpWB6Lw than local cluster uuid 693EJRStTo-3GwqanfvZkA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1077) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_211]
[2020-09-06T14:51:14,115][WARN ][r.suppressed             ] [elk-es-node-02] path: /_license, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:555) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
[2020-09-06T14:51:14,356][WARN ][r.suppressed             ] [elk-es-node-02] path: /_license, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:259) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:322) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:249) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:555) [elasticsearch-7.1.0.jar:7.1.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0.jar:7.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]

It does not seem like the new node is configured to link up with the other nodes. What does your elasticsearch.yml config file look like?

Looks a single node and not part of a cluster
Make sure you use same cluster config from your existing setup

Sorry, i've post elasticsearch.log at first time. Now its updated to cluster.log.

Looks like the problem in this "Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid VdZPYWRCT8eMvjFOpWB6Lw than local cluster uuid 693EJRStTo-3GwqanfvZkA, rejecting". But i dont understand how that can be fixed.

I think the new node has been started up without the correct config and created a cluster of its own. Wipe the storage of the new node and launch it with the correct config as that should allow it to join.

2 Likes

It worked! I realized that the problem was related to the fact that I first started ES with default configuration to check .conf, which caused a new cluster to be created, and then tried to attach the node to the old cluster and got an error.

Thanks all!

1 Like