New Node Not Talking to Cluster

Bradly · February 25, 2016, 4:13pm

I added a new node yesterday afternoon and got to a point where it should have been communicating with the cluster but its not.

I receive this error on the new node in the elasticsearch.log file:
[2016-02-25 09:33:53,802][WARN ][transport.netty ] [Node-5] Message not fully read (response) for [2698] handler org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$4@2bc2db6d, error [true], resetting
[2016-02-25 09:33:53,803][WARN ][discovery.zen.ping.unicast] [KIBANA-ALPHA] failed to send ping to [[#zen_unicast_2#][Node-5][inet[/172.16.0.4:9300]]]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:178)

I get this error on the other end (172.16.0.4):
[2016-02-25 09:34:24,545][WARN ][transport.netty ] [ELASTICSEARSH-ALPHA] exception caught on transport layer [[id: 0x9626bf34, /172.31.0.3:43864 => /172.31.0.65:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [2770], action [internal:discovery/zen/unicast_gte_1_4], readerIndex [59] vs expected [199]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived
After searching I turned up this command:
curl 'localhost:9200/_nodes/jvm?pretty'

On the new node I get:
{
"cluster_name" : "ELASTICSEARCH",
"nodes" : {
"lAPD4ZLtT3Kzcj38e9hf-g" : {
"name" : "NODE-5",
"transport_address" : "inet[/172.16.0.5:9300]",
"host" : "KIBANA-ALPHA",
"ip" : "127.0.1.1",
"version" : "1.6.2",
"build" : "NA",
"http_address" : "inet[/172.16.0.5:9200]",
"attributes" : {
"data" : "false",
"master" : "false"
},
"jvm" : {
"pid" : 1801,
"version" : "1.7.0_95",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.95-b01",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1456413210741,
"mem" : {
"heap_init_in_bytes" : 268435456,
"heap_max_in_bytes" : 1065025536,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1065025536
},
"gc_collectors" : [ "Copy", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Eden Space", "Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
}
}
}

On the other nodes I show:
{
"cluster_name" : "ELASTICSEARCH",
"nodes" : {
"xQXVmG3UTQKJ0lcgEjLR7g" : {
"name" : "NODE1",
"transport_address" : "172.16.0.1:9300",
"host" : "172.16.0.1",
"ip" : "172.16.0.1",
"version" : "2.2.0",
"build" : "8ff36d1",
"http_address" : "172.16.0.1:9200",
"jvm" : {
"pid" : 1490,
"version" : "1.8.0_72",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.72-b15",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1456347785056,
"mem" : {
"heap_init_in_bytes" : 268435456,
"heap_max_in_bytes" : 1056309248,
"non_heap_init_in_bytes" : 2555904,
"non_heap_max_in_bytes" : 0,
"direct_max_in_bytes" : 1056309248
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Metaspace", "Compressed Class Space", "Par Eden Space", "Par Survivor Space", "CMS Old Gen" ],
"using_compressed_ordinary_object_pointers" : "true"
}
},

The only difference I see is the lines on transport address read "transport_address" : "inet[/172.16.0.5:9300]", on the new server and "transport_address" : "172.16.0.1:9300", on the old servers. Any help would be appreciated.

Brad

crickes · February 25, 2016, 4:22pm

Hi,

Your new node is trying to come up on ES 1.7.2 where the rest of your cluster is on 2.2.0? Your new node are also using a much newer version of JAVA (1.8.0_72 vs 1.7.0_95). If I was in your shoes, I would get the new node to run the same version of both Java and ES before trying to go any further to rule out any issues between incompatible verions.

Steve

Bradly · February 25, 2016, 7:57pm

Awesome. Updated the new one to 2.2.0 and off it went. Didn't even see that when I was reading through it.

Thanks,
Brad

crickes · February 25, 2016, 8:06pm

If you haven't done so yet, I'd recommend upgrading the Java version on the new node to match the rest of your cluster also.

Bradly · February 25, 2016, 8:32pm

I will get that done this evening when the firewalls are not shooting quite so much data at the cluster. Do you recommend openjdk or oracle?? I think right now I am running openjdk but if I am going to upgrade it might be a good time to make the switch if there is a good solid argument for a change.

crickes · February 25, 2016, 8:48pm

I haven't tried openJDK. My cluster is currently running jre-8u72-linux-x64.rpm and I haven't had any issues. I believe they are up to release jre-8u74-linux-x64.rpm.

Topic		Replies	Views
Serialization issues on 0.90.3 Elasticsearch	9	483	July 6, 2017
Make cluster more resilient to network failures - how? Elasticsearch	3	1176	September 15, 2017
Failed to send ping to [{#zen_unicast}] RemoteTransportException in Tribe node setup Elasticsearch	2	2622	July 5, 2017
Elastic search cluster throwing error Elasticsearch	2	1113	July 5, 2017
Trying to configure dedicated elasticsearch server with production server and getting below issue Elasticsearch	4	513	July 6, 2017

New Node Not Talking to Cluster

Related topics