Upgraded node not rejoining non upgraded cluster 6.5.1 to 6.7 upgrade - IOException[Invalid string; unexpected character: 255 hex: ff]

smbnj · March 29, 2019, 3:20pm

Hi @DavidTurner - will bringing down one of the 2 remaining nodes in order to upgrade it to 6.6.2 not result in the cluster failing and loss of data?

John_Swift · March 29, 2019, 3:27pm

@DavidTurner we are now fully upgraded !!
gUX9 10.137.54.61 9300 6.7.0 - WP005882-SCOLO-NODE4
9uRH 10.137.48.60 9300 6.7.0 - WP005881-SCOLO-NODE3
VzRI 10.105.50.53 9300 6.7.0 - WP005885-ONYX-NODE1
JqqW 10.105.158.40 9300 6.7.0 * WP005878-ONYX-NODE2

Thanks for yours & @jasontedor help on this.
The Machine Learning thing i think wasnt realated at all in hindsight, it looks like when i started that node back up the 6.5.1 version of the app was the one that started not 6.7

Just to summarise in case anyone else if having the same issues

Node upgraded from 6.5.1 to 6.7.0 failed with error failed to send join request to master IOException[Invalid string; unexpected character: 255 hex: ff]; ]

Workaround is to upgrade entire cluster to 6.6.2 first

use GET _cat/nodes?v&h=id,ip,port,v,m,n to verify that all nodes are at 6.6.2

Then upgrade to 6.7

I did also encounter a couple of other issues that were easy enough to resolve

Service doesn’t start if dir %CONFIG_DIR%/ingest-geoip exists- just renamed to ingest-geoip_old & then started up the service
MSI installer continually fails to install 6.6.2. Uninstalled 6.5.1 using the remove programs & installed from new. our data & config dirs existed outside of the program install location so wasn’t removed during the uninstall however if they did then they would have been removed so back them up first if you haven’t already

DavidTurner · March 29, 2019, 3:53pm

It's a risk, yes. The cluster will be unavailable while you're upgrading that node, but should return to health once the node restarts. If you can't handle this then the safest path to take is something like this:

temporarily start another empty 6.6.2 node; wait for the cluster health to be green.
upgrade the other nodes to 6.6.2
bring the already-upgraded 6.7.0 node up
decommission the temporary node by adding shard allocation filters to move any data onto the other nodes; wait for this process to complete
shut down the temporary node
upgrade the other two 6.6.2 nodes.

bmagistro · March 30, 2019, 10:44pm

@John_Swift I am not sure I would rule out the issue being related to Machine Learning yet.

We upgraded our cluster from 6.4.3 to 6.7.0, when the first node started up the IOException[Invalid string; unexpected character: 255 hex: ff] message was being logged. As the only jobs we had were test ones, we were able to remove them. Once the jobs were removed, the node was able to rejoin the cluster without issue.

DavidTurner · March 31, 2019, 8:48am

That's right @bmagistro, this is caused by a problem in how ML data feeds are transferred over the network, fixed in #40610.

smbnj · April 1, 2019, 8:58am

Fortunately I was able to rpm downgrade 6.7 to 6.6.2 and index compatibility was maintained. This allowed us to perform a rolling upgrade to 6.6.2 and then 6.7 without losing cluster resilience.

DavidTurner · April 1, 2019, 10:45am

I'm glad to hear that this worked for you, but I should point out for the benefit of other readers that downgrading a node is very much unsupported and can result in a very broken cluster. The supported path forward is to start new 6.6.2 nodes.

system · April 29, 2019, 10:45am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
After upgrading ES version, new node cannot join to the cluster Elasticsearch	1	311	February 23, 2022
Issue upgrade from 6.7 to 7 for single node Elasticsearch	9	2775	May 12, 2019
Can't join node after upgrade Elasticsearch	9	1844	November 20, 2018
Node is not joining the cluster (ES-5.6.3) Elasticsearch	7	1924	December 14, 2017
ElasticSearch 5.4 Nodes unable to join cluster - Troubleshooting Elasticsearch	5	7354	June 17, 2017

Upgraded node not rejoining non upgraded cluster 6.5.1 to 6.7 upgrade - IOException[Invalid string; unexpected character: 255 hex: ff]

Related topics