I have a simple 3 node ELK 5.1.0 Cluster that i am trying to upgrade to 5.1.1 using provided RPMs .
I can't see anything that could cause this on the release notes for 5.1.1
The cluster work as expected as 5.1.0 but when upgrading one node to 5.1.1 and starting it up with exactly the same configuration i get the following (only pasting what i think is relevant out of the java stack trace )
[2016-12-12T10:08:53,557][INFO ][o.e.n.Node ] [infra-elk-es1.lon-dc.mintel.ad] initializing ...
[2016-12-12T10:08:53,610][INFO ][o.e.e.NodeEnvironment ] [infra-elk-es1.lon-dc.mintel.ad] using [1] data paths, mounts [[/data (/dev/vdb)]], net usabl
e_space [397.8gb], net total_space [499.7gb], spins? [possibly], types [xfs]
[2016-12-12T10:08:53,611][INFO ][o.e.e.NodeEnvironment ] [infra-elk-es1.lon-dc.mintel.ad] heap size [7.9gb], compressed ordinary object pointers [true
]
[2016-12-12T10:08:54,674][INFO ][o.e.n.Node ] [infra-elk-es1.lon-dc.mintel.ad] node name [infra-elk-es1.lon-dc.mintel.ad], node ID [vN6s_3-
XS2i11w1v59o8dg]
[2016-12-12T10:08:54,676][INFO ][o.e.n.Node ] [infra-elk-es1.lon-dc.mintel.ad] version[5.1.1], pid[18954], build[5395e21/2016-12-06T12:36:1
5.409Z], OS[Linux/2.6.32-642.6.2.el6.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b15]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [aggs-matrix-stats]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [ingest-common]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-expression]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-groovy]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-mustache]
[2016-12-12T10:08:55,317][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [lang-painless]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [percolator]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [reindex]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [transport-netty3]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] loaded module [transport-netty4]
[2016-12-12T10:08:55,318][INFO ][o.e.p.PluginsService ] [infra-elk-es1.lon-dc.mintel.ad] no plugins loaded
[2016-12-12T10:09:01,099][INFO ][o.e.n.Node ] [infra-elk-es1.lon-dc.mintel.ad] initialized
[2016-12-12T10:09:01,099][INFO ][o.e.n.Node ] [infra-elk-es1.lon-dc.mintel.ad] starting ...
[2016-12-12T10:09:01,223][INFO ][o.e.t.TransportService ] [infra-elk-es1.lon-dc.mintel.ad] publish_address {172.31.0.6:9300}, bound_addresses {0.0.0.0:
9300}
[2016-12-12T10:09:01,228][INFO ][o.e.b.BootstrapCheck ] [infra-elk-es1.lon-dc.mintel.ad] bound or publishing to a non-loopback or non-link-local addr
ess, enforcing bootstrap checks
[2016-12-12T10:09:04,305][INFO ][o.e.d.z.ZenDiscovery ] [infra-elk-es1.lon-dc.mintel.ad] failed to send join request to master [{infra-elk-es4.lon-dc
.mintel.ad}{5CA_TtyUSDau2ZtJBKWzyQ}{y73HV3ReSi67NGuyk9Shhg}{172.31.1.232}{172.31.1.232:9300}], reason [RemoteTransportException[[Failed to deserialize ex
ception response from stream]]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: IllegalArgumentExc
eption[port out of range:2380801]; ]
[2016-12-12T10:09:04,332][WARN ][o.e.t.n.Netty4Transport ] [infra-elk-es1.lon-dc.mintel.ad] exception caught on transport layer [[id: 0xad57d6a8, L:/172
.31.0.6:40484 - R:172.31.1.232/172.31.1.232:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [14], handler [org.elasticsearch.transport.TransportService$ContextResto
reResponseHandler/future(org.elasticsearch.transport.EmptyTransportResponseHandler@77d1e794)], error [true]; resetting
The line about port being out of range is particularly interesting to me ...
port out of range:2380801
Sure that port is out of range if we are talking tcp ports ...
After this the node try to join again and again with the same error and the same exact "port out of range" message.
# java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-b15)
OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)
Reverting the node to 5.1.0 fix the problem for now
Anyone experiencing the same issue or has any idea what am i doing wrong ?