Configure cluster

talial · March 4, 2017, 7:12pm

Hi,
I created 3 vm machine with elasticsearch installation
I want to create cluster with one master and 2 data nodes
configure the yml as follow:
master node:
cluster.name: elastic
node.name: master_elk
node.master: true
node.data: false
network.host: the ip of the master machine
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["master IP", "data 1 ip", "data2 ip"]
discovery.zen.minimum_master_nodes: 1
data 1+2:
cluster.name: elasic
node.name: data1
node.master: true
node.data: true
network.host: the ip of the data machine
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["master IP", "data 1 ip", "data2 ip"]
discovery.zen.minimum_master_nodes: 1

when trying to telnet from the master to the data machine ip with port 9200 I get a connection but when trying with port 9300 I dont get connection
and when running health check to the cluster I get status red

Any help will be greate
Thanks,
Talia

talial · March 4, 2017, 8:00pm

forgot to add that the elasticsearch version I installed is 5.2

curl -XGET 'http://XXXX:9200/_nodes/transport?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elastic",
"nodes" : {
"H4pskB4fQOOmJT4m74goIg" : {
"name" : "master_elk",
"transport_address" : "127.0.0.1:9300",
"host" : "localhost",
"ip" : "127.0.0.1",
"version" : "5.2.0",
"build_hash" : "24e05b9",
"roles" : [
"master",
"ingest"
],
"transport" : {
"bound_address" : [
"127.0.0.1:9300",
"[::1]:9300"
],
"publish_address" : "127.0.0.1:9300",
"profiles" : { }
}
}
}
}

warkolm · March 5, 2017, 4:42am

Just set network.host and you should be ok.

talial · March 5, 2017, 12:02pm

still not working

dadoonet · March 5, 2017, 1:24pm

Logs?

talial · March 5, 2017, 2:12pm

when I just set the network.host without transport.host the elastic search is not started

in the log file I have:
[2017-03-05T16:00:43,997][INFO ][o.e.t.TransportService ] [master] publish_address {xxxx:9300}, bound_addresses {xxxx:9300}
[2017-03-05T16:00:44,004][INFO ][o.e.b.BootstrapChecks ] [master] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-03-05T16:00:44,006][ERROR][o.e.b.Bootstrap ] [master] node validation exception
bootstrap checks failed
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

jasontedor · March 5, 2017, 3:22pm

This tells you exactly what the problem is. Earlier in your logs will be output telling you why the system call filters failed to install. Often it's due to your kernel not supporting them. You have two options: change to kernel that supports the seccomp features that are needed here, or disable system call filters at your own risk. This is covered in the docs.

talial · March 5, 2017, 5:33pm

OK, Thanks

I add just for the testing of discovery this line to yml file:
bootstrap.system_call_filter: false
now I have telnet with ports 9200 + 9300
but when I try to run the discovery I get timeout

curl -XGET 'machine_ip:9200/_cat/health?v&pretty'

try to add those line in the yml:
discovery.zen.join_timeout: 90s
discovery.zen.ping_timeout: 30s

still having timeout

{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}

in the log:
[WARN ][o.e.n.Node ] [master] timed out while waiting for initial discovery state - timeout: 30s
[INFO ][o.e.h.HttpServer ] [master] publish_address {machine ip:9200}, bound_addresses {machine ip:9200}
[INFO ][o.e.n.Node ] [master] started
[INFO ][o.e.d.z.ZenDiscovery ] [master] failed to send join request to master [{master}{H4pskB4fQOOmJT4m74goIg}{vpQbi0o7R_mog2jiRpv0yg}{machine ip}{machine ip:9300}], reason [RemoteTransportException[[master][machine ip:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{master}{H4pskB4fQOOmJT4m74goIg}{mwU8gjwgQWiyRr_xG3FlHg}{machine ip}{machine ip:9300}] not master for join request]; ], tried [3] times
[2017-03-05T19:27:37,324][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [master] no known master node, scheduling a retry
[2017-03-05T19:28:07,325][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [master] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2017-03-05T19:28:07,326][WARN ][r.suppressed ] path: /_cat/health, params: {pretty=, v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:211) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:307) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:237) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1157) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:527) [elasticsearch-5.2.0.jar:5.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]

Thanks for the replys

jasontedor · March 6, 2017, 1:55am

Your configuration above shows:

transport.host: localhost

for the master node. You need to bind both the master and the data node to non-loopback interfaces or they will not be able to connect with each other across your network.

talial · March 6, 2017, 12:41pm

change the yml, configure the transport.host: machine IP
in the log I cam see that :
bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks

but stiil when running the curl -XGET 'machine_ip:9200/_cat/health?v&pretty'
get the same errors in the log

Maybe there is something that I have to configure in the etc/hosts file?
What I missing?

iti · March 8, 2017, 5:01am

Can you telnet from data node to master at port 9200/9300?
Also, since you have 3 mater nodes,
discovery.zen.minimum_master_nodes: 2

talial · March 8, 2017, 10:58am

Thanks for the reply

telnet is working from all 3 servers

also checked and OK:
curl 10.15.20.10:9200
curl 10.15.20.11:9200
curl 10.15.20.12:9200

[elasticsearch@master ~]$ curl -XGET 'http://10.15.20.10:9200/_nodes/transport?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elkcl",
"nodes" : {
"H4pskB4fQOOmJT4m74goIg" : {
"name" : "master",
"transport_address" : "10.15.20.10:9300",
"host" : "10.15.20.10",
"ip" : "10.15.20.10",
"version" : "5.2.0",
"build_hash" : "24e05b9",
"roles" : [
"master",
"ingest"
],
"transport" : {
"bound_address" : [
"10.15.20.10:9300"
],
"publish_address" : "10.15.20.10:9300",
"profiles" : { }
}
}
}
}

[elasticsearch@ptktl-elkdev2 ~]$ curl -XGET 'http://10.15.20.11:9200/_nodes/transport?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elkcl",
"nodes" : {
"H4pskB4fQOOmJT4m74goIg" : {
"name" : "ptktl-elkdev2",
"transport_address" : "10.15.20.11:9300",
"host" : "10.15.20.11",
"ip" : "10.15.20.11",
"version" : "5.2.0",
"build_hash" : "24e05b9",
"roles" : [
"data",
"ingest"
],
"transport" : {
"bound_address" : [
"10.15.20.11:9300"
],
"publish_address" : "10.15.20.11:9300",
"profiles" : { }
}
}
}
}

[elasticsearch@ptktl-elkdev2 ~]$ curl -XGET 'http://10.15.20.12:9200/_nodes/transport?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elkcl",
"nodes" : {
"H4pskB4fQOOmJT4m74goIg" : {
"name" : "ptktl-elkdev3",
"transport_address" : "10.15.20.12:9300",
"host" : "10.15.20.12",
"ip" : "10.15.20.12",
"version" : "5.2.0",
"build_hash" : "24e05b9",
"roles" : [
"data",
"ingest"
],
"transport" : {
"bound_address" : [
"10.15.20.12:9300"
],
"publish_address" : "10.15.20.12:9300",
"profiles" : { }
}
}
}
}

I want to have one master and 2 data machine
configure the yml as follow:

master node:
cluster.name: elastic
node.name: master
node.master: true
node.data: false
bootstrap.system_call_filter: false
network.host: 10.15.20.10
transport.host: 10.15.20.10
transport.tcp.port: 9300
http.port: 9200
network.publish_host: 10.15.20.10
discovery.zen.ping.unicast.hosts: ["10.15.20.10:9300", "10.15.20.11:9300", "10.15.20.12:9300"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.join_timeout: 90s
discovery.zen.ping_timeout: 90s

data 1+2:
cluster.name: elasic
node.name: data1
node.master: false
node.data: true
bootstrap.system_call_filter: false
transport.host: 10.15.20.11
transport.tcp.port: 9300
http.port: 9200
network.host: 10.15.20.11
network.publish_host: 10.15.20.11
discovery.zen.ping.unicast.hosts: ["10.15.20.10:9300", "10.15.20.11:9300", "10.15.20.12:9300"]
discovery.zen.minimum_master_nodes: 1
discovery.zen.join_timeout: 90s
discovery.zen.ping_timeout: 90s

[elasticsearch@master ~]$ curl http://10.15.20.10:9200/_cluster/health?pretty=true
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}

in the log:
[INFO ][o.e.t.TransportService ] [master] publish_address {10.15.20.10:9300}, bound_addresses {10.15.20.10:9300}
[INFO ][o.e.b.BootstrapChecks ] [master] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[WARN ][o.e.n.Node ] [master] timed out while waiting for initial discovery state - timeout: 30s
[INFO ][o.e.h.HttpServer ] [master] publish_address {10.15.20.10:9200}, bound_addresses {10.15.20.10:9200}
[INFO ][o.e.n.Node ] [master] started
[DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [master] no known master node, scheduling a retry
[DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [master] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[WARN ][r.suppressed ] path: /_cluster/health, params: {pretty=true}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:211) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:307) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:237) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1157) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:527) [elasticsearch-5.2.0.jar:5.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
[INFO ][o.e.d.z.ZenDiscovery ] [master] failed to send join request to master [{master}{H4pskB4fQOOmJT4m74goIg}{zEXhVspzSuy69JgbEZbRoQ}{10.15.20.10}{10.15.20.10:9300}], reason [RemoteTransportException[[master][10.15.20.10:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{master}{H4pskB4fQOOmJT4m74goIg}{Y0Bi0UqcT-e0mHFC5OYasg}{10.15.20.10}{10.15.20.10:9300}] not master for join request]; ], tried [3] times

Any suggestions?
Also if I change the discovery.zen.minimum_master_nodes: 2 its still not working

Please advise

talial · March 9, 2017, 3:51pm

Hi,
Now I have in the log file at the data servers:
found existing node with the same id but is a different node instance

While checking this error I found out that:
"I think that you copied the data folder from one to the other. In particular, this means that the node ID was copied along with it, and we do not allow two nodes with the same ID to join the cluster."

Maybe this is my problem?

How can I check the node ID?
How can I change the ID if its the same?

see above reply I upload the outpout of:
curl -XGET 'http://10.15.20.12:9200/_nodes/transport?pretty' from each machine
Does the node id is:
"nodes" : {
"H4pskB4fQOOmJT4m74goIg"
If so how can I change it?

Please advice
Thanks,
Talia

system · April 6, 2017, 3:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster is not working Elasticsearch	12	9111	December 21, 2018
Data node cannot find master node Elasticsearch	23	4197	October 22, 2019
Es 5 node cannot join the master node Elasticsearch	38	3450	May 19, 2017
Elasticsearch remains unhealthy because master node or Node is not able to find master node Elasticsearch	13	3625	August 17, 2017
Can't join cluster Elasticsearch	5	456	July 6, 2017

Configure cluster

Related topics