I'm trying to build ElasticSearch cluster but it cause an error.
Log for master node (All IPs and comments were omitted due to privacy)
[2020-06-23T16:33:47,361][WARN ][o.e.c.c.Coordinator ] [kn-log-01] failed to validate incoming join request from node [{kn-log-02}{tuCA1_YARK-HkHyzbpG4Nw}{0yZHEJGAQpKgWw336U2vDQ}{127.0.0.2}{127.0.0.2:9300}{dilrt}{ml.machine_memory=134888939520, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [kn-log-02][127.0.0.2:9300][internal:cluster/coordination/join/validate] request_id [88] timed out after [59835ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1041) [elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Log for data node to join
org.elasticsearch.transport.RemoteTransportException: [kn-log-01][127.0.0.1:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:514) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.transport.TransportService$8.run(TransportService.java:1001) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.7.0.jar:7.7.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [kn-log-02][127.0.0.2:9300][internal:cluster/coordination/join/validate] disconnected
[2020-06-23T16:41:47,433][WARN ][o.e.c.c.ClusterFormationFailureHelper] [kn-log-02] master not discovered yet: have discovered [{kn-log-02}{tuCA1_YARK-HkHyzbpG4Nw}{0yZHEJGAQpKgWw336U2vDQ}{127.0.0.2}{127.0.0.2:9300}{dilrt}{ml.machine_memory=134888939520, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; discovery will continue using [127.0.0.1:9300, 127.0.0.3:9300, 127.0.0.4:9300] from hosts providers and [] from last-known cluster state; node term 1, last-accepted version 0 in term 0
[2020-06-23T16:41:57,434][WARN ][o.e.c.c.ClusterFormationFailureHelper] [kn-log-02] master not discovered yet: have discovered [{kn-log-02}{tuCA1_YARK-HkHyzbpG4Nw}{0yZHEJGAQpKgWw336U2vDQ}{127.0.0.2}{127.0.0.2:9300}{dilrt}{ml.machine_memory=134888939520, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; discovery will continue using [127.0.0.1:9300, 127.0.0.3:9300, 127.0.0.4:9300] from hosts providers and [] from last-known cluster state; node term 1, last-accepted version 0 in term 0
The node trying to request joining every minutes but caues time-out error. It doesn't work now but yesterday did without changing any settings about ElasticSearch (maybe).
elasticsearch.yml for master node
cluster.name: mycluster
node.name: kn-log-01
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.seed_hosts: ["127.0.0.1", "127.0.0.2", "127.0.0.3", "127.0.0.4"]
cluster.initial_master_nodes: ["kn-log-01"]
node.master: true
node.data: true
elasticsearch.yml for data node
cluster.name: mycluster
node.name: kn-log-02
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.seed_hosts: ["127.0.0.1", "127.0.0.2", "127.0.0.3", "127.0.0.4"]
cluster.initial_master_nodes: ["kn-log-01"]
node.master: false
node.data: true
$ curl -XGET 127.0.0.1:9200
{
"name" : "kn-log-01",
"cluster_name" : "mycluster",
"cluster_uuid" : "jN-0FJwDRZqlAtQ6LpXwug",
"version" : {
"number" : "7.7.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
"build_date" : "2020-05-12T02:01:37.602180Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
$ curl -XGET 127.0.0.2:9200
{
"name" : "kn-log-02",
"cluster_name" : "mycluster",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.7.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
"build_date" : "2020-05-12T02:01:37.602180Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
$ curl -XGET 127.0.0.1:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1 15 2 0 0.01 0.03 0.05 dilmrt * kn-log-01
What I did already:
- Checking firewalld settings about 9200, 9300 port again.
- Rebooting all machines.
- Wipe ElasticSearch data folders and restart services.