Elasticsearch 7.3.1 - Help on three nodes cluster - Master not discovered yet

dcenaculo · February 18, 2021, 1:36pm

Hi,

I had Elasticsearch working well on the PORSCHE node. Then I tried to add two more nodes, OPEL and MORRIS. All on the same subnet.

PORSCHE: 10.146.114.66
MORRIS: 10.146.114.89
OPEL: 10.146.114.88

This is the current configuration on each node, only changes node.name and network.host respectively (elasticsearch.yml):

cluster.name: AA-PT-Monitoring
node.name: MORRIS
network.host: 10.146.114.89
http.port: 9200
discovery.seed_hosts: ["PORSCHE", "MORRIS", "OPEL"]

I tried with this line enabled but it didn't work either.
#cluster.initial_master_nodes: ["PORSCHE", "MORRIS", "OPEL"]

The cluster never worked. And now even the PORSCHE that used to work alone doesn't work.

PORSCHE error message:

[2021-02-18T13:08:06,784][WARN ][o.e.c.c.ClusterFormationFailureHelper] [PORSCHE] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{PORSCHE}{C52I2unxSe-IFq73oXKlow}{RyX_On2ORGq2FbYO6bXcnA}{10.146.114.66}{10.146.114.66:9300}{dim}{ml.machine_memory=25769332736, xpack.installed=true, ml.max_open_jobs=20}]; discovery will continue using [[fe80::11cf:c7da:d2b6:e5b4]:9300, 10.146.114.89:9300, 10.146.114.88:9300] from hosts providers and [{PORSCHE}{C52I2unxSe-IFq73oXKlow}{RyX_On2ORGq2FbYO6bXcnA}{10.146.114.66}{10.146.114.66:9300}{dim}{ml.machine_memory=25769332736, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

MORRIS error message:

[2021-02-18T12:56:53,254][WARN ][o.e.c.c.ClusterFormationFailureHelper] [MORRIS] master not discovered yet: have discovered [{MORRIS}{eybhidjOTcGMuYhtajvzPw}{79dXWumgSt-JwdInWTldbg}{10.146.114.89}{10.146.114.89:9300}{di}{ml.machine_memory=25769332736, xpack.installed=tru
    e, ml.max_open_jobs=20}, {PORSCHE}{C52I2unxSe-IFq73oXKlow}{RyX_On2ORGq2FbYO6bXcnA}{10.146.114.66}{10.146.114.66:9300}{dim}{ml.machine_memory=25769332736, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [10.146.114.66:9300, [fe80::8c6c:4377:ebb4
    :43ad]:9300, 10.146.114.88:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term

OPEL error message:

[2021-02-18T13:32:34,015][WARN ][o.e.c.c.ClusterFormationFailureHelper] [OPEL] master not discovered yet: have discovered [{OPEL}{FDsPmsvZTrOaEkH4mUqQAQ}{tCjyk9X_RC6QJHpOODeAZA}{10.146.114.88}{10.146.114.88:9300}{di}{ml.machine_m
    emory=25769332736, xpack.installed=true, ml.max_open_jobs=20}, {PORSCHE}{C52I2unxSe-IFq73oXKlow}{tDfK5OqWTZ273bgjS-eNmA}{10.146.114.66}{10.146.114.66:9300}{dim}{ml.machine_memory=25769332736, ml.max_open_jobs=20, xpack.installed=
    true}]; discovery will continue using [10.146.114.66:9300, 10.146.114.89:9300, [fe80::f41e:df94:2ca2:b886]:9300] from hosts providers and [] from last-known cluster state; node term 462, last-accepted version 230 in term 462

I'm new to this and I need to be able to create the cluster. Can you help, please?

dcenaculo · February 18, 2021, 10:19pm

Hi,

I've deleted data folder and enabled this on on all three nodes:

cluster.initial_master_nodes: ["PORSCHE", "MORRIS", "OPEL"]

PORSCHE and OPEL have created a cluster but MORRIS still can't join them. It gives this error:

[2021-02-18T22:06:32,018][INFO ][o.e.c.c.JoinHelper       ] [MORRIS] last failed join attempt was 2.9s ago, failed to join {OPEL}{Q8H1KKbrS3CEbCNJxxoTsg}{8q9RyWoAQGedDcs8cNvdZA}{10.146.114.88}{10.146.114.88:9300}{dim}{ml.machine_memory=25769332736, ml.max_open_jobs=20, x
pack.installed=true} with JoinRequest{sourceNode={MORRIS}{wR9BwagXTMybyWFFsOWKrg}{MF292sNUSSmBiMcox0h4hw}{10.146.114.89}{10.146.114.89:9300}{dim}{ml.machine_memory=25769332736, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional[Join{term=1, lastAcceptedTer
m=0, lastAcceptedVersion=0, sourceNode={MORRIS}{wR9BwagXTMybyWFFsOWKrg}{MF292sNUSSmBiMcox0h4hw}{10.146.114.89}{10.146.114.89:9300}{dim}{ml.machine_memory=25769332736, xpack.installed=true, ml.max_open_jobs=20}, targetNode={OPEL}{Q8H1KKbrS3CEbCNJxxoTsg}{8q9RyWoAQGedDcs8cN
vdZA}{10.146.114.88}{10.146.114.88:9300}{dim}{ml.machine_memory=25769332736, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [OPEL][10.146.114.88:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.transport.ConnectTransportException: [MORRIS][10.146.114.89:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:957) ~[elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:161) ~[elasticsearch-7.3.1.jar:7.3.1]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.3.1.jar:7.3.1]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.3.1.jar:7.3.1]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:68) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:495) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.IOException: Connection timed out: no further information: 10.146.114.89/10.146.114.89:9300
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]
Caused by: java.io.IOException: Connection timed out: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:670) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]

Do you know what may be causing this?

Thanks in advance
Daniel

stephenb · February 18, 2021, 11:48pm

dcenaculo:

Caused by: org.elasticsearch.transport.ConnectTransportException: [MORRIS][10.146.114.89:9300] connect_exception
...

Caused by: java.io.IOException: Connection timed out: no further information

Looks like a connectivity issue to me
Can you telnet from OPEL to MORRIS on 9300
From OPEL
telnet 10.146.114.89 9300

dcenaculo · February 19, 2021, 12:57pm

Hi Stephen,

You were right. Now I can ping and telnet Both ports, 9300 and 9200, from OPEL to MORRIS. Inbound rules were missing on the MORRIS firewall.

MORRIS has started:

[2021-02-19T11:41:11,134][INFO ][o.e.h.AbstractHttpServerTransport] [MORRIS] publish_address {10.146.114.89:9200}, bound_addresses {10.146.114.89:9200}
[2021-02-19T11:41:11,134][INFO ][o.e.n.Node               ] [MORRIS] started

For some reason, MORRIS was still unable to join the cluster. I remembered that I had experienced:
POST /_cluster/voting_config_exclusions?node_names=MORRIS

I ran:
DELETE /_cluster/voting_config_exclusions

And everything was fine:
GET http://morris:9200/

{
  "name" : "MORRIS",
  "cluster_name" : "AA-PT-Monitoring",
  "cluster_uuid" : "_UaiOyWxRSm3xgBYy62Vsw",
  "version" : {
    "number" : "7.3.1",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "4749ba6",
    "build_date" : "2019-08-19T20:19:25.651794Z",
    "build_snapshot" : false,
    "lucene_version" : "8.1.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

GET http://morris:9200/_cluster/health?pretty

{
  "cluster_name" : "AA-PT-Monitoring",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 21,
  "active_shards" : 42,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Thank you so much Stephen

system · March 19, 2021, 12:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.