Missing Client Nodes in v5.6


(Aditya Sharma) #1

Hi I upgraded to elastic version 5.6, Now in the yml file when I put node.client: true it says there is nothing called as client. And when I searched online, I don't find anything with client node in the v5.6 instead i get an ingest node.

Are the client node and the ingest node the same? How can I configure a node to act as a client/ingest?


Limitation of client node (Tribe node)
(Christian Dahlqvist) #2

What used to be called client node is now referred to as coordinating only node.


(Aditya Sharma) #3

Hi, thanks for that. I followed the documentation, after setting it up as coordinating only, the cluster_uuid turns up na. Any idea why?


(Christian Dahlqvist) #4

Can you share your configuration? Where does it show up as na?


(Aditya Sharma) #5

yeah sure. I'll share the curl hit I made

[root@es-client-01 ec2-user]# curl http://22.xx.xx.21:9200
{
  "name" : "es-client-01",
  "cluster_name" : "atom",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "5.6.0",
    "build_hash" : "781a835",
    "build_date" : "2017-09-07T03:09:58.087Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

I am not able to access the client node as well when I access it over it's public IP with port 9200, but when I make it into a master node, then I am able to connect to it on the same port and same public IP


(Aditya Sharma) #6

@Christian_Dahlqvist I had posted a question on stackoverflow as here we have a limit as to how many characters can be entered and the yml file is too huge for the limit. The link is: Can't find nodes

Am I missing something?


(Christian Dahlqvist) #7

What is the configuration of the data nodes? What do you get if you call the cat nodes API? You can put the configurations into a gist and link to it here.


(Aditya Sharma) #8

@Christian_Dahlqvist I have uploaded the files on oneDrive Here. These are prepended with client, master and data respectively. I hope it provides mode insights. Please do let me know if you need anything else.


(Aditya Sharma) #9

@Christian_Dahlqvist when I fire up the _cat/nodes API via curl on the client node I get No master found exception. below is the exact response:

[root@es-client-01 ec2-user]# curl http://22.0.6.82:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}[1]+  Done                    curl http://22.0.6.82:9200/_cat/nodes?v

and when I hit the same call to the master node I get the valid response:

[root@es-master-01 ec2-user]# curl http://22.0.6.81:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
22.0.6.64            2          31   2    0.07    0.04     0.01 m         *      es-master-01

(Christian Dahlqvist) #10

It looks like you only have 3 nodes in your cluster. If that is the case, why are you using dedicated node types? For small clusters I always recommend letting all nodes have all roles and setting minimum_master_nodes to 0. Just because you can have dedicated node types does not mean that it is a good idea.

That does however not explain why your cluster is not forming. Have you verified that you can telnet to port 9300 on from each of the hosts to the other hosts and that there is no firewall preventing connectivity?


(Aditya Sharma) #11

I am not able to telnet to the port 9300 below is what it says. Surprisingly, it says the same for 9200, a few days back on the same set up I was able to do it.

[root@es-master-01 ec2-user]# telnet http://22.0.6.82 9300
telnet: http://22.0.6.82: Name or service not known
http://22.0.6.21: Unknown host
[root@es-master-01 ec2-user]# curl http://22.0.6.82:9300
[root@es-master-01 ec2-user]# telnet http://22.0.6.82 9200
telnet: http://22.0.6.82: Name or service not known
http://22.0.6.21: Unknown host
[root@es-master-01 ec2-user]#

Why does it say unknown host?

On the other hand, as you mentioned about letting all nodes play all roles sounds good to me, I just need to get them to join in the cluster. Also, how many minimum nodes would you recommend to shift to a dedicated master, client and data nodes?


(Christian Dahlqvist) #12

Why do you have http here? If you are not able to connect to the other nodes on port 9300, which is not a HTTP port, that will explain why the nodes can not connect.


(Aditya Sharma) #13

@Christian_Dahlqvist o that's just silly me. below is the updated telnet from the master to the coordinate only node:

[root@es-master-01 ec2-user]# telnet 22.0.6.82 9300
Trying 22.0.6.82...
^C
[root@es-master-01 ec2-user]# telnet 22.0.6.82 9200
Trying 22.0.6.82...
Connected to 22.0.6.82.
Escape character is '^]'.
^C
Connection closed by foreign host.
[root@es-master-01 ec2-user]#

If you'll have a look, I don't get any response from on the port 9300 while 9200 works perfectly fine.

I have my node's firewall off as well.


(Christian Dahlqvist) #14

There is something preventing the connection on port 9300, which is the reason the cluster can not form.


(Aditya Sharma) #15

@Christian_Dahlqvist Is it network related or is it the configuration? I've checked the java version, OS version (I know it doesn't matter but I just did), turned off firewall, tried to use the public ips of the nodes instead of the private ips, used a port number in the unicast.hosts Made both nodes all in one still they just won't connect. So a telnet to port 9300 should get connected, but they won't for now. I've been stuck at this for almost a week now.

Any help would be much appreciated.

On the other hand, how many nodes until we should shift to a dedicated master, client and data nodes?


(Christian Dahlqvist) #16

If the node you are trying to connect to is running and you can not telnet to it, it is probably network related.


(Aditya Sharma) #17

@Christian_Dahlqvist Okay, so I opened up the 9300 port, that was causing the nodes not to discover but now that they do, I face a different error. Below is the snippet of it:

[2017-09-18T04:35:13,372][INFO ][o.e.d.z.ZenDiscovery     ] [es-client-01] failed to send join request to master [{es-master-01}{WuPkAPBVTJGIZdwsjUnujw}{OfaEGmFlQa2LikWrWa6MnQ}{22.0.6.64}{22.0.6.64:9300}], reason [RemoteTransportException[[es-master-01][22.0.6.64:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {es-client-01}{WuPkAPBVTJGIZdwsjUnujw}{fz0O5iQ5Q7OwPEKfw8VxDA}{22.0.6.21}{22.0.6.21:9300}, found existing node {es-master-01}{WuPkAPBVTJGIZdwsjUnujw}{OfaEGmFlQa2LikWrWa6MnQ}{22.0.6.64}{22.0.6.64:9300} with the same id but is a different node instance]; ]

Gives me a failed to send a join request to master.


(poojagupta) #18

this error seems to be happen when your host from which you are hitting the request to join cluster is able to send the request on 9300 port but another node in the cluster is not able to accept the request.
for this make sure you have opened up the 9300 and 9200 port on all nodes for making cluster and do communication in the cluster.


(Aditya Sharma) #19

@poojagupta thanks for joining in to this. I have all my ports opened up and I can do a telnet to 9300 and 9200 from either of the nodes to each other.

I may sound stupid here, but is it possible that, it just gives up if it gets an inactive state. I faced this issue with Logstash, after I set the logstash inactivity period to 1 day, it started working. Is there any possibility that something similar is happening here?


(Aditya Sharma) #20

@poojagupta On a quite surprising note, For now I am just trying to get 2 nodes join in to a cluster and both are acting all roles.

The funny thing is, their logs have the same error just the host names have changed. below is the error from the other node (named es-master-01)

[2017-09-18T05:26:54,805][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-01] failed to send join request to master [{es-client-01}{WuPkAPBVTJGIZdwsjUnujw}{jf_rnw_YQMqY_VUhqSd1dQ}{22.0.6.21}{22.0.6.21:9300}], reason [RemoteTransportException[[es-client-01][22.0.6.21:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {es-master-01}{WuPkAPBVTJGIZdwsjUnujw}{Ljp3Ltc2ROS7rR-Ax2jp1w}{22.0.6.64}{22.0.6.64:9300}, found existing node {es-client-01}{WuPkAPBVTJGIZdwsjUnujw}{jf_rnw_YQMqY_VUhqSd1dQ}{22.0.6.21}{22.0.6.21:9300} with the same id but is a different node instance]; ] 

I checked, there is only one instance of elasticsearch running on my nodes.