Error while clustering nodes from different host/machines (ES 7.0)

realjoonha · May 3, 2019, 8:17am

Hi,

I'm trying to cluster a number of nodes from different machines (with different IP address), and having some troubles with it.

Below are the basic info of the error.
Please understand if it is insufficient - I'm totally a newbie to ES so I don't know all the meanings what I've done until now (..)

1. with curl, I get this:

{
  "name" : "node-1",
  "cluster_name" : "es1",
  "cluster_uuid" : "8R4RhJK9T-Wduft3dDID4Q",
  "version" : {
    "number" : "7.0.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "b7e28a7",
    "build_date" : "2019-04-05T22:55:32.697037Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.7.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

    {
  "name" : "node-4",
  "cluster_name" : "neofect-es1",
  "cluster_uuid" : "j-vmLd7pRx63nKakE9UGjQ",
  "version" : {
    "number" : "7.0.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "b7e28a7",
    "build_date" : "2019-04-05T22:55:32.697037Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.7.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Here, node-1 is intended to be a master node @ server #1, and node-4 is intended to be a servant @ server #2.

2. and the elasticsearch.yml files:

node-1 (master node):
*cluster.name: es1
*node.name: node-1
*bootstrap.memory_lock: true
*network.host: "IP for Server#1"
*http.port: 9200
*discovery.seed_hosts: ["IP for Server#1", "IP for Server#2"]

cluseter.initial_master_nodes: ["node-1"]

node-4 (servant node):
*cluster.name: es1
*node.name: node-4
*bootstrap.memory_lock: true
*network.host: "IP for Server#2"
*http.port: 9200
*discovery.seed_hosts: ["IP for Server#1", "IP for Server#2"]

cluseter.initial_master_nodes: ["node-1"]

3. and finally the logs

- from master node:

[2019-05-03T01:53:14,907][INFO ][o.e.c.c.JoinHelper       ] [node-1] failed to join {node-4}{N2eCzKe7R42BId4s2xg0Xg}{AMVCL8LVQ3y-fTNMR6yqWg}{"IP4Server#2"}{"IP4Server#2":9300}{ml.machine_memory=16728576000, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{sw4IUak8Q-ijNU4guTXI9g}{yK4nXxdrSFuCnSlhAPFEug}{"IP4Server#1"}{"IP4Server#1":9300}{ml.machine_memory=16728576000, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=10, lastAcceptedTerm=7, lastAcceptedVersion=51, sourceNode={node-1}{sw4IUak8Q-ijNU4guTXI9g}{yK4nXxdrSFuCnSlhAPFEug}{"IP4Server#1"}{"IP4Server#1":9300}{ml.machine_memory=16728576000, xpack.installed=true, ml.max_open_jobs=20}, targetNode={node-4}{N2eCzKe7R42BId4s2xg0Xg}{AMVCL8LVQ3y-fTNMR6yqWg}{"IP4Server#2"}{"IP4Server#2":9300}{ml.machine_memory=16728576000, ml.max_open_jobs=20, xpack.installed=true}}]}

org.elasticsearch.transport.RemoteTransportException: [node-4]["IP4Server#2":9300][internal:cluster/coordination/join]

Caused by: java.lang.IllegalStateException: failure when sending a validation request to node

        at org.elasticsearch.cluster.coordination.Coordinator$3.onFailure(Coordinator.java:500) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.cluster.coordination.JoinHelper$5.handleException(JoinHelper.java:351) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.transport.TcpTransport.lambda$handleException$24(TcpTransport.java:1001) ~[elasticsearch-7.0.0.jar:7.0.0]

        at [org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681](http://org.elasticsearch.common.util.concurrent.threadcontext$contextpreservingrunnable.run(threadcontext.java:681/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_191]

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1]["IP4Server#1":9300][internal:cluster/coordination/join/validate]

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid j-vmLd7pRx63nKakE9UGjQ than local cluster uuid 8R4RhJK9T-Wduft3dDID4Q, rejecting

        at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) ~[?:?]

        at [org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37](http://org.elasticsearch.common.util.concurrent.abstractrunnable.run(abstractrunnable.java:37/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) ~[?:?]

        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1077) ~[elasticsearch-7.0.0.jar:7.0.0]

        at [org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751](http://org.elasticsearch.common.util.concurrent.threadcontext$contextpreservingabstractrunnable.dorun(threadcontext.java:751/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at [org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37](http://org.elasticsearch.common.util.concurrent.abstractrunnable.run(abstractrunnable.java:37/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_191]

        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_191]

Here's where I'm stucked. Please help - what should I do to cluster these nodes?
What's wrong?

realjoonha · May 3, 2019, 8:18am

- from servant node:

[2019-05-03T01:53:10,653][WARN ][o.e.c.c.Coordinator      ] [node-4] failed to validate incoming join request from node [{node-1}{sw4IUak8Q-ijNU4guTXI9g}{yK4nXxdrSFuCnSlhAPFEug}{"IP4Server#1"}{"IP4Server#1":9300}{ml.machine_memory=16728576000, ml.max_open_jobs=20, xpack.installed=true}]

org.elasticsearch.transport.RemoteTransportException: [node-1]["IP4Server#1":9300][internal:cluster/coordination/join/validate]

Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid j-vmLd7pRx63nKakE9UGjQ than local cluster uuid 8R4RhJK9T-Wduft3dDID4Q, rejecting

        at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) ~[?:?]

        at [org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37](http://org.elasticsearch.common.util.concurrent.abstractrunnable.run(abstractrunnable.java:37/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) ~[?:?]

        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.0.0.jar:7.0.0]

        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1077) ~[elasticsearch-7.0.0.jar:7.0.0]

        at [org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751](http://org.elasticsearch.common.util.concurrent.threadcontext$contextpreservingabstractrunnable.dorun(threadcontext.java:751/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at [org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37](http://org.elasticsearch.common.util.concurrent.abstractrunnable.run(abstractrunnable.java:37/)) ~[elasticsearch-7.0.0.jar:7.0.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

[2019-05-03T01:53:10,807][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [node-4] Failed to clear cache for realms [[]]

DavidTurner · May 3, 2019, 8:39am

Hi @realjoonha,

This message tells us that at some point in the past these nodes belonged to different clusters, but now you are trying to join them together:

This is not permitted, because merging two clusters together might lose data.

If you're just starting out and haven't put any data into these nodes yet then the best thing to do is to shut down node-4, delete its whole data directory, and then restart it.

See these docs for more information.

realjoonha · May 3, 2019, 9:49am

Oh, wow. It helped. I successfully clustered the nodes.
However, I have one more question about the reason 'node-4' had been in another cluster in the past.
If I set-up two different nodes at different machines/hosts, run them at their own localhost for each of them, and then try to change configuration to cluster them as one cluster,
will it always come up with the error I got?

To state it differently, I wanna know if clustering of nodes should be always done in serial.

Thanks alot!

DavidTurner · May 3, 2019, 10:22am

Yes.

No, you can start all the nodes at once. As long as they all have the same setting for cluster.initial_master_nodes they will correctly join up into a single cluster.

system · May 31, 2019, 10:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.