Upgrade from 5.1.0 to 5.1.1 on Centos6 Makes a node unable to join the cluster

fciocchetti · December 12, 2016, 10:19am

I have a simple 3 node ELK 5.1.0 Cluster that i am trying to upgrade to 5.1.1 using provided RPMs .

I can't see anything that could cause this on the release notes for 5.1.1

The cluster work as expected as 5.1.0 but when upgrading one node to 5.1.1 and starting it up with exactly the same configuration i get the following (only pasting what i think is relevant out of the java stack trace )

[2016-12-12T10:08:53,557][INFO ][o.e.n.Node               ] [infra-elk-es1.lon-dc.mintel.ad] initializing ...
[2016-12-12T10:08:53,610][INFO ][o.e.e.NodeEnvironment    ] [infra-elk-es1.lon-dc.mintel.ad] using [1] data paths, mounts [[/data (/dev/vdb)]], net usabl
e_space [397.8gb], net total_space [499.7gb], spins? [possibly], types [xfs]
[2016-12-12T10:08:53,611][INFO ][o.e.e.NodeEnvironment    ] [infra-elk-es1.lon-dc.mintel.ad] heap size [7.9gb], compressed ordinary object pointers [true
]
[2016-12-12T10:08:54,674][INFO ][o.e.n.Node               ] [infra-elk-es1.lon-dc.mintel.ad] node name [infra-elk-es1.lon-dc.mintel.ad], node ID [vN6s_3-
XS2i11w1v59o8dg]
[2016-12-12T10:08:54,676][INFO ][o.e.n.Node               ] [infra-elk-es1.lon-dc.mintel.ad] version[5.1.1], pid[18954], build[5395e21/2016-12-06T12:36:1
5.409Z], OS[Linux/2.6.32-642.6.2.el6.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b15]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [aggs-matrix-stats]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [ingest-common]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-expression]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-groovy]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-mustache]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-painless]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [percolator]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [reindex]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [transport-netty3]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [transport-netty4]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService     ] [infra-elk-es1.lon-dc.mintel.ad] no plugins loaded
[2016-12-12T10:09:01,099][INFO ][o.e.n.Node               ] [infra-elk-es1.lon-dc.mintel.ad] initialized
[2016-12-12T10:09:01,099][INFO ][o.e.n.Node               ] [infra-elk-es1.lon-dc.mintel.ad] starting ...
[2016-12-12T10:09:01,223][INFO ][o.e.t.TransportService   ] [infra-elk-es1.lon-dc.mintel.ad] publish_address {172.31.0.6:9300}, bound_addresses {0.0.0.0:
9300}
[2016-12-12T10:09:01,228][INFO ][o.e.b.BootstrapCheck     ] [infra-elk-es1.lon-dc.mintel.ad] bound or publishing to a non-loopback or non-link-local addr
ess, enforcing bootstrap checks
[2016-12-12T10:09:04,305][INFO ][o.e.d.z.ZenDiscovery     ] [infra-elk-es1.lon-dc.mintel.ad] failed to send join request to master [{infra-elk-es4.lon-dc
.mintel.ad}{5CA_TtyUSDau2ZtJBKWzyQ}{y73HV3ReSi67NGuyk9Shhg}{172.31.1.232}{172.31.1.232:9300}], reason [RemoteTransportException[[Failed to deserialize ex
ception response from stream]]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: IllegalArgumentExc
eption[port out of range:2380801]; ]
[2016-12-12T10:09:04,332][WARN ][o.e.t.n.Netty4Transport  ] [infra-elk-es1.lon-dc.mintel.ad] exception caught on transport layer [[id: 0xad57d6a8, L:/172
.31.0.6:40484 - R:172.31.1.232/172.31.1.232:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [14], handler [org.elasticsearch.transport.TransportService$ContextResto
reResponseHandler/future(org.elasticsearch.transport.EmptyTransportResponseHandler@77d1e794)], error [true]; resetting

The line about port being out of range is particularly interesting to me ...

port out of range:2380801

Sure that port is out of range if we are talking tcp ports ...

After this the node try to join again and again with the same error and the same exact "port out of range" message.

# java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-b15)
OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)

Reverting the node to 5.1.0 fix the problem for now

Anyone experiencing the same issue or has any idea what am i doing wrong ?

dadoonet · December 12, 2016, 10:38am

Did you really have a 5.1.0 version before? Asking that because it should not have been released as not ready for production.

From: Elastic Stack 5.1.1 Released | Elastic Blog

Yup, you read that right. Version 5.1.0 doesn’t exist because, for a short period of time, the Elastic Yum and Apt repositories included unreleased binaries labeled 5.1.0. To avoid any confusion, and upgrade issues for the people that have installed these without realizing, we have decided to skip the 5.1.0 version and release 5.1.1 instead.

So can you confirm what gives:

GET /

fciocchetti · December 12, 2016, 10:46am

I do mirror the repo so was lucky enough to mirror during that short period of time i guess.

[root@infra-elk-es1 ~]# curl localhost:9200/ 2>/dev/null | jq .
{
  "tagline": "You Know, for Search",
  "version": {
    "lucene_version": "6.3.0",
    "build_snapshot": false,
    "build_date": "2016-11-24T08:20:05.232Z",
    "build_hash": "e5e3f1f",
    "number": "5.1.0"
  },
  "cluster_uuid": "FQXt2XxHSXaxbAiHXEea9g",
  "cluster_name": "infra_elk",
  "name": "infra-elk-es1.lon-dc.mintel.ad"
}

Could there be a flag in 5.1.1 that prevents joining a 5.1.0 , kinda marked as broken ?

I can totally bring down this cluster and upgrade all nodes to 5.1.1 and bring it back up if that helps ... just don't want to do it as a test or the downgrade might become messy.

At the moment testing the upgrade and downgrading is really easy since the updated node simply does not join.

dadoonet · December 12, 2016, 11:06am

Could there be a flag in 5.1.1 that prevents joining a 5.1.0 , kinda marked as broken ?

No there is not AFAIK but for sure something weird is happening.

I'll try to reproduce your case in the next hours to see if I can find anything.

Thanks for opening that. May be it is worth opening the issue on github in the meantime and link to this discussion?

fciocchetti · December 12, 2016, 11:10am

Will do, thanks

Let me know if there is any other information you need

I am running on Centos 6.8 , java version posted in first comment

config :

cluster.name: infra_elk
node.name: infra-elk-es1.lon-dc.mintel.ad
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
bootstrap.memory_lock: true
http.host: 127.0.0.1
transport.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: ["infra-elk-es1.lon-dc.mintel.ad", "infra-elk-es4.lon-dc.mintel.ad", "infra-elk-es3.lon-dc.mintel.ad"]
discovery.zen.minimum_master_nodes: 2
http.cors.enabled: true
http.cors.allow-origin: "/.*/"

In jvm.options only setting heap , everything else is as shipped from the rpm

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-server
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError

fciocchetti · December 12, 2016, 11:28am

Github Case - 22113

fciocchetti · December 12, 2016, 12:08pm

a full cluster restart wih upgrade seem to have worked fine.

system · January 9, 2017, 12:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't upgrade elasticsearch 5.2.1 to 5.6.5 Elasticsearch	17	3215	February 11, 2018
Elasticsearch upgrade from 6.4.1 to 6.7.0, upgraded node is unable to join the cluster Elasticsearch	11	3575	May 8, 2019
Upgrading to Elastic Search 7.1.x not able to get the cluster up Elasticsearch elastic-stack-security , docker	2	502	July 31, 2020
Rolling upgrade 6.8 cluster to 7.10, node cannot connect back to cluster Elastic Search painless	2	29	November 7, 2024
Upgrade cluster from 0.17.8 to 0.18.1 nuwer node unable to join Elasticsearch	3	305	July 6, 2017

Upgrade from 5.1.0 to 5.1.1 on Centos6 Makes a node unable to join the cluster

Related topics