New Node Not Talking to Cluster


(Brad Birdwell) #1

I added a new node yesterday afternoon and got to a point where it should have been communicating with the cluster but its not.

I receive this error on the new node in the elasticsearch.log file:
[2016-02-25 09:33:53,802][WARN ][transport.netty ] [Node-5] Message not fully read (response) for [2698] handler org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$4@2bc2db6d, error [true], resetting
[2016-02-25 09:33:53,803][WARN ][discovery.zen.ping.unicast] [KIBANA-ALPHA] failed to send ping to [[#zen_unicast_2#][Node-5][inet[/172.16.0.4:9300]]]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:178)

I get this error on the other end (172.16.0.4):
[2016-02-25 09:34:24,545][WARN ][transport.netty ] [ELASTICSEARSH-ALPHA] exception caught on transport layer [[id: 0x9626bf34, /172.31.0.3:43864 => /172.31.0.65:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [2770], action [internal:discovery/zen/unicast_gte_1_4], readerIndex [59] vs expected [199]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived
After searching I turned up this command:
curl 'localhost:9200/_nodes/jvm?pretty'

On the new node I get:
{
"cluster_name" : "ELASTICSEARCH",
"nodes" : {
"lAPD4ZLtT3Kzcj38e9hf-g" : {
"name" : "NODE-5",
"transport_address" : "inet[/172.16.0.5:9300]",
"host" : "KIBANA-ALPHA",
"ip" : "127.0.1.1",
"version" : "1.6.2",
"build" : "NA",
"http_address" : "inet[/172.16.0.5:9200]",
"attributes" : {
"data" : "false",
"master" : "false"
},
"jvm" : {
"pid" : 1801,
"version" : "1.7.0_95",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.95-b01",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1456413210741,
"mem" : {
"heap_init_in_bytes" : 268435456,
"heap_max_in_bytes" : 1065025536,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1065025536
},
"gc_collectors" : [ "Copy", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Eden Space", "Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
}
}
}

On the other nodes I show:
{
"cluster_name" : "ELASTICSEARCH",
"nodes" : {
"xQXVmG3UTQKJ0lcgEjLR7g" : {
"name" : "NODE1",
"transport_address" : "172.16.0.1:9300",
"host" : "172.16.0.1",
"ip" : "172.16.0.1",
"version" : "2.2.0",
"build" : "8ff36d1",
"http_address" : "172.16.0.1:9200",
"jvm" : {
"pid" : 1490,
"version" : "1.8.0_72",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.72-b15",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1456347785056,
"mem" : {
"heap_init_in_bytes" : 268435456,
"heap_max_in_bytes" : 1056309248,
"non_heap_init_in_bytes" : 2555904,
"non_heap_max_in_bytes" : 0,
"direct_max_in_bytes" : 1056309248
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Metaspace", "Compressed Class Space", "Par Eden Space", "Par Survivor Space", "CMS Old Gen" ],
"using_compressed_ordinary_object_pointers" : "true"
}
},

The only difference I see is the lines on transport address read "transport_address" : "inet[/172.16.0.5:9300]", on the new server and "transport_address" : "172.16.0.1:9300", on the old servers. Any help would be appreciated.

Brad


(Steve Crickett) #2

Hi,

Your new node is trying to come up on ES 1.7.2 where the rest of your cluster is on 2.2.0? Your new node are also using a much newer version of JAVA (1.8.0_72 vs 1.7.0_95). If I was in your shoes, I would get the new node to run the same version of both Java and ES before trying to go any further to rule out any issues between incompatible verions.

Steve


(Brad Birdwell) #3

Awesome. Updated the new one to 2.2.0 and off it went. Didn't even see that when I was reading through it.

Thanks,
Brad


(Steve Crickett) #4

If you haven't done so yet, I'd recommend upgrading the Java version on the new node to match the rest of your cluster also.


(Brad Birdwell) #5

I will get that done this evening when the firewalls are not shooting quite so much data at the cluster. Do you recommend openjdk or oracle?? I think right now I am running openjdk but if I am going to upgrade it might be a good time to make the switch if there is a good solid argument for a change.


(Steve Crickett) #6

I haven't tried openJDK. My cluster is currently running jre-8u72-linux-x64.rpm and I haven't had any issues. I believe they are up to release jre-8u74-linux-x64.rpm.


(system) #7