Node not finding the master node


#1

Hello everyone!

I am using ES 5.6.2 and I created two nodes within a cluster.
I simply want to have one master node and one slave node.

Here are the configuration files for both of them:

  cluster.name: nomad-cluster

  node.name: nomad-node-fnico-test
  node.master: true
  node.data: true

  path.data: /var/data/elasticsearch
  path.logs: /var/log/elasticsearch

  network.host: 0.0.0.0
  network.bind_host: _global_

  http.port: 9200

#  transport.host: 130.183.198.7
#  transport.tcp.port: 9300-9400

  discovery.zen.ping.unicast.hosts: ["130.183.198.7:9200","130.183.198.71:9200"]
  discovery.zen.minimum_master_nodes: 1

and

 cluster.name: nomad-cluster

  node.name: nomad-node-fnico-test2
  node.master: false
  node.data: true

  path.data: /var/data/elasticsearch
  path.logs: /var/log/elasticsearch

# network.host: 130.183.198.7
# network.bind_ cluster.name: nomad-cluster

  node.name: nomad-node-fnico-test2
  node.master: false
  node.data: true

  path.data: /var/data/elasticsearch
  path.logs: /var/log/elasticsearch

# network.host: 130.183.198.7
# network.bind_host: 130.183.198.7
  network.host: 0.0.0.0
  network.bind_host: _global_
  http.port: 9200

#  transport.host: _global_
#  transport.tcp.port: 9300-9400
#  discovery.zen.ping.unicast.enabled: true
  discovery.zen.ping.unicast.hosts: ["130.183.198.7:9200","130.183.198.71:9200"]

  discovery.zen.minimum_master_nodes: 1

host: 130.183.198.7
  network.host: 0.0.0.0
  network.bind_host: _global_
  http.port: 9200

#  discovery.zen.ping.multicast.enabled: false  -- not in ES5 !

#  transport.host: _global_
#  transport.tcp.port: 9300-9400
#  discovery.zen.ping.unicast.enabled: true
  discovery.zen.ping.unicast.hosts: ["130.183.198.7:9200","130.183.198.71:9200"]

  discovery.zen.minimum_master_nodes: 1

The nodes are on two different machines connected with an ssh connection with forwarding on the port 9200.

ES on the master node works fine, but not on the slave node. This happens:

[2017-10-12T14:01:17,912][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] initializing ...
[2017-10-12T14:01:18,178][INFO ][o.e.e.NodeEnvironment    ] [nomad-node-fnico-test2] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [15.8gb], net total_space [19gb], spins? [unknown], types [rootfs]
[2017-10-12T14:01:18,179][INFO ][o.e.e.NodeEnvironment    ] [nomad-node-fnico-test2] heap size [193.3mb], compressed ordinary object pointers [true]
[2017-10-12T14:01:18,180][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] node name [nomad-node-fnico-test2], node ID [zbTiVNMyTXqUmNykROtk7Q]
[2017-10-12T14:01:18,180][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] version[5.6.2], pid[23128], build[57e20f3/2017-09-23T13:16:45.703Z], OS[Linux/3.10.0-327.10.1.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_141/25.141-b15]
[2017-10-12T14:01:18,181][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] JVM arguments [-Xms200m, -Xmx200m, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/local/elasticsearch-5.6.2]
[2017-10-12T14:01:20,568][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [aggs-matrix-stats]
[2017-10-12T14:01:20,568][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [ingest-common]
[2017-10-12T14:01:20,568][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [lang-expression]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [lang-groovy]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [lang-mustache]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [lang-painless]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [parent-join]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [percolator]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [reindex]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [transport-netty3]
[2017-10-12T14:01:20,569][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] loaded module [transport-netty4]
[2017-10-12T14:01:20,570][INFO ][o.e.p.PluginsService     ] [nomad-node-fnico-test2] no plugins loaded
[2017-10-12T14:01:24,410][INFO ][o.e.d.DiscoveryModule    ] [nomad-node-fnico-test2] using discovery type [zen]
[2017-10-12T14:01:25,778][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] initialized
[2017-10-12T14:01:25,778][INFO ][o.e.n.Node               ] [nomad-node-fnico-test2] starting ...
[2017-10-12T14:01:26,073][INFO ][o.e.t.TransportService   ] [nomad-node-fnico-test2] publish_address {130.183.198.71:9300}, bound_addresses {130.183.198.71:9300}
[2017-10-12T14:01:26,098][INFO ][o.e.b.BootstrapChecks    ] [nomad-node-fnico-test2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-10-12T14:01:29,266][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2017-10-12T14:01:32,268][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2017-10-12T14:01:35,270][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2017-10-12T14:01:38,271][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2017-10-12T14:01:41,273][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
[2017-10-12T14:01:44,275][WARN ][o.e.d.z.ZenDiscovery     ] [nomad-node-fnico-test2] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again

What could be the issue here?

Thanks!


(David Pilato) #2

The problem is that you are using port 9200 where 9300 should be used instead.

Instead of

discovery.zen.ping.unicast.hosts: ["130.183.198.7:9200","130.183.198.71:9200"]

Write:

discovery.zen.ping.unicast.hosts: ["130.183.198.7:9300","130.183.198.71:9300"]

Or even easier:

discovery.zen.ping.unicast.hosts: ["130.183.198.7","130.183.198.71"]

But what do you mean by "master/slave". We don't have that notion in elasticsearch. What kind of problem do you want to solve or do you think you need to solve?


#3

Yes, I changed the port to 9300 but the issue is the same.

I have two virtual machines with one instance of Elasticsearch 5.6.2 on each of them.
I would like a cluster with two nodes in it. One master mode on one machine and one data node on another.
The idea is that my indices would furst be put on the master node and the other data node would be synced to the master node in order to have the same indices.


(David Pilato) #4

You are misunderstanding how elasticsearch works. There is no master/slave nodes.

But there are primary shards and replica shards which can be allocated on any data node in the cluster.

On a 2 nodes cluster (well you need at least 3 to avoid split brain), you don't need to change any setting like this.


#5

I would start with explicit network.host not 0.0.0.0. Someone else might be able to tell you if that should work...

Can you curl or telnet to port 9200 and 9300 from one VM to the other?

(Also, setting one node to master does not automatically mean that the indices are first created there and then replicated to other nodes. Dedicate master nodes don't even store data.) ... Too slow :stuck_out_tongue:


#6

I will try the curl and telnet tests.
I think then that I would like my primary shards on one machine and my replica shards on another.


#7

For telnet, should I do telnet 130.183.198.7 9200 ? Because nothing happens when I do that. The port is probably not open.


#8

This is what I get when using telnet and curl

~ # telnet 192.168.1.110 9200
Trying 192.168.1.110...
Connected to 192.168.1.110.
Escape character is '^]'.

Connection closed by foreign host.
~ # telnet 192.168.1.110 9300
Trying 192.168.1.110...
Connected to 192.168.1.110.
Escape character is '^]'.

Connection closed by foreign host.
~ # curl http://192.168.1.110:9200
{
"name" : "node_name",
"cluster_name" : "logs",
"cluster_uuid" : "NydKtj3QSdqoiPjd189avQ",
"version" : {
"number" : "5.5.2",
"build_hash" : "b2f0c09",
"build_date" : "2017-08-14T12:33:14.154Z",
"build_snapshot" : false,
"lucene_version" : "6.6.0"
},
"tagline" : "You Know, for Search"
}
~ # curl http://192.168.1.110:9300
This is not a HTTP port

If these ports are not accessible between the VMs, Elasticsearch clustering will definitely not work.

And again, Elasticsearch does not work as master - slave the way you hope it to.

From https://www.elastic.co/guide/en/elasticsearch/reference/5.6/modules-node.html#master-node

The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node.

-AB


#9

Forgot to ask, what hypervisor are you using? If the services are running on your VMs then check network security settings. Although, as the VMs are in the same subnet, it really should work without anything.


#10

I am working on a cloud computing service of my institute. I think I will just contact some collegues who manage that. I need to thoroughly check about all these network settings and make sure all the necessary ports are open.

As I said, my plan was just to mirror the index I have on one machine to another machine.


#11

A_B what is the result of your

netstat -alp

exactly? I would like to compare with mine.


#12

Hello again! I found a way to make a snapshot of my index and copy the backup directory to the other server.

Nonetheless, I would really like to find out how is it possible to make nodes finding each other. All my curl and telnet commands are working, the ssh connections are working too.


#13

I think i will go for the Logstash solution with ES as input and output.
I will open a new topic in the appropriate category.
If no-one is able to answer my previous question, then I guess we can close this topic.


(andy_zhou) #14

discovery.zen.ping.unicast.hosts: ["130.183.198.7","130.183.198.71"]
use this setting
the cluster name need as same.
output curl http://192.168.1.110:9200/_cluster/health?pretty


#15

The issue is not http communication here, it's the communication between the nodes. It uses port 9300 contrarily to HTTP which uses port 9200. My curls work when I connect with SSH and port forwarding like

ssh -L 9200:localhost:9200 root@

If i do this for port 9300, then my nodes still don't talk to each other.


#16

Here you go. I have two nodes on the same machine so one is running on ports 9200/9300 and the other one in 9201/9301

~ # netstat -alp | grep 9[2,3]0[0,1]
tcp 0 0 hostname.b:34984 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:34983 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:34982 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:35443 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:34932 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:35439 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:35438 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:34937 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:35440 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:35291 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:36917 hostname.bt:9200 ESTABLISHED 554/node
tcp 0 0 hostname.b:35442 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:35441 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:35444 hostname.bt:9200 TIME_WAIT -
tcp 0 0 hostname.b:38128 hostname.bt:9200 ESTABLISHED 554/node
tcp6 0 0 hostname.bt:9201 [::]:* LISTEN 609/java
tcp6 0 0 hostname.bt:9300 [::]:* LISTEN 608/java
tcp6 0 0 hostname.bt:9301 [::]:* LISTEN 609/java
tcp6 0 0 hostname.bt:9200 [::]:* LISTEN 608/java
tcp6 0 0 hostname.b:53245 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9200 hostname.b:35291 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9201 hostname.b:41149 ESTABLISHED 609/java
tcp6 0 0 hostname.b:60067 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9300 hostname.b:53249 ESTABLISHED 608/java
tcp6 0 0 hostname.b:41171 hostname.bt:9201 ESTABLISHED 555/java
tcp6 0 0 hostname.bt:9200 hostname.b:36917 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9200 hostname.b:34933 ESTABLISHED 608/java
tcp6 0 0 hostname.b:34933 hostname.bt:9200 ESTABLISHED 555/java
tcp6 0 0 hostname.bt:9300 hostname.b:53240 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53242 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9200 hostname.b:34979 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9301 hostname.b:60071 ESTABLISHED 609/java
tcp6 0 0 hostname.b:34979 hostname.bt:9200 ESTABLISHED 555/java
tcp6 0 0 hostname.b:60068 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:41124 hostname.bt:9201 ESTABLISHED 555/java
tcp6 0 0 hostname.b:60062 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9300 hostname.b:53248 ESTABLISHED 608/java
tcp6 0 0 hostname.b:60061 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53243 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.b:34978 hostname.bt:9200 ESTABLISHED 555/java
tcp6 0 0 hostname.b:60066 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9301 hostname.b:60065 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9201 hostname.b:41153 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9300 hostname.b:53246 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9300 hostname.b:53244 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9301 hostname.b:60066 ESTABLISHED 609/java
tcp6 0 0 hostname.b:53250 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9301 hostname.b:60062 ESTABLISHED 609/java
tcp6 0 0 hostname.b:60073 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9200 hostname.b:34937 ESTABLISHED 608/java
tcp6 0 0 hostname.b:60065 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9200 hostname.b:34980 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9201 hostname.b:41124 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9300 hostname.b:53245 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9201 hostname.b:41151 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9200 hostname.b:34976 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9200 hostname.b:34983 ESTABLISHED 608/java
tcp6 0 0 hostname.b:34976 hostname.bt:9200 ESTABLISHED 555/java
tcp6 0 0 hostname.bt:9200 hostname.b:38128 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9200 hostname.b:34932 ESTABLISHED 608/java
tcp6 0 0 hostname.b:41151 hostname.bt:9201 ESTABLISHED 555/java
tcp6 0 0 hostname.b:60070 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:41149 hostname.bt:9201 ESTABLISHED 555/java
tcp6 0 0 hostname.bt:9300 hostname.b:53247 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53244 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9200 hostname.b:34984 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53241 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9300 hostname.b:53243 ESTABLISHED 608/java
tcp6 0 0 hostname.bt:9300 hostname.b:53252 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53249 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.b:60072 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53252 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9301 hostname.b:60064 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9301 hostname.b:60067 ESTABLISHED 609/java
tcp6 0 0 hostname.b:53248 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9300 hostname.b:53250 ESTABLISHED 608/java
tcp6 0 0 hostname.b:34980 hostname.bt:9200 ESTABLISHED 555/java
tcp6 0 0 hostname.bt:9301 hostname.b:60070 ESTABLISHED 609/java
tcp6 0 0 hostname.b:60071 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53240 hostname.bt:9300 ESTABLISHED 609/java
tcp6 0 0 hostname.bt:9200 hostname.b:34982 ESTABLISHED 608/java
tcp6 0 0 hostname.b:60064 hostname.bt:9301 ESTABLISHED 608/java
tcp6 0 0 hostname.b:53247 hostname.bt:9300 ESTABLISHED 609/java
....


#17

There is my netstat result, it looks quite different :-/

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      28023/sshd          
tcp        0      0 localhost:smtp          0.0.0.0:*               LISTEN      822/master          
tcp        0      0 vm-130-123-123-7.cl:ssh mich.apg:37608 VERBUNDEN   26800/sshd: root@pt 
tcp        0      0 vm-130-123-123-7.cl:ssh mich.apg:37750 VERBUNDEN   26898/sshd: root@pt 
tcp6       0      0 localhost:wap-wsp       [::]:*                  LISTEN      26840/java          
tcp6       0      0 localhost:vrace         [::]:*                  LISTEN      26840/java          
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      28023/sshd          
tcp6       0      0 localhost:smtp          [::]:*                  LISTEN      822/master          
udp        0      0 0.0.0.0:44090           0.0.0.0:*                           650/dhclient        
udp        0      0 0.0.0.0:bootpc          0.0.0.0:*                           650/dhclient        
udp6       0      0 [::]:12260              [::]:*                              650/dhclient

(system) #18

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.