Elasticsearch upgrade from 6.4.1 to 6.7.0, upgraded node is unable to join the cluster

shiv94 · April 4, 2019, 2:50pm

Hello,
elasticserach, logstash, kibana - 6.4.1 version

Rolling upgrade

I have 3 nodes in the ELK cluster. I upgraded elasticsearch in node 3 from 6.4.1 to 6.7.0, then I get the following errors.

In the logs,

[2019-04-04T10:37:45,617][WARN ][o.e.c.l.LogConfigurator  ] Some logging configurations have %marker but don't have %node_name. We will automatically add %node_name to the pattern to ease the migration for users who customize log4j2.properties but will stop this behavior in 7.0. You should manually replace `%node_name` with `[%node_name]%marker ` in these locations:
 [2019-04-04T10:37:45,905][INFO ][o.e.e.NodeEnvironment    ]] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [10.4gb], net total_space [30.3gb], types [ext4]
[2019-04-04T10:37:45,906][INFO ][o.e.e.NodeEnvironment    ]] heap size [11.9gb], compressed ordinary object pointers [true]
[2019-04-04T10:37:46,388][INFO ][o.e.n.Node               ] node name [node3], node ID [odIrW6POTeS21bKLlRS7_Q]
[2019-04-04T10:37:46,389][INFO ][o.e.n.Node               ] version[6.7.0], pid[20803], build[default/deb/8453f77/2019-03-21T15:32:29.844721Z], OS[Linux/4.4.0-130-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b11]
[2019-04-04T10:37:46,389][INFO ][o.e.n.Node               ] JVM arguments [-Dfile.encoding=UTF-8, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djna.nosys=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+AlwaysPreTouch, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintGCDateStamps, -XX:+PrintGCDetails, -XX:+PrintTenuringDistribution, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+UseConcMarkSweepGC, -XX:+UseGCLogFileRotation, -XX:-OmitStackTraceInFastThrow, -XX:CMSInitiatingOccupancyFraction=75, -XX:GCLogFileSize=64m, -XX:NumberOfGCLogFiles=32, -Xloggc:/var/log/elasticsearchgc.log, -Xms12g, -Xmx12g, -Xss1m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb]
[2019-04-04T10:37:48,516][INFO ][o.e.p.PluginsService     ] loaded module [aggs-matrix-stats]
[2019-04-04T10:37:48,516][INFO ][o.e.p.PluginsService     ] loaded module [analysis-common]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-common]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-geoip]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-user-agent]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-expression]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-mustache]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-painless]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [mapper-extras]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [parent-join]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [percolator]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [rank-eval]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [reindex]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [repository-url]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [transport-netty4]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [tribe]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ccr]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-core]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-deprecation]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-graph]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ilm]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] loaded module [x-pack-logstash]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ml]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-monitoring]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-rollup]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-security]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-sql]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-upgrade]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-watcher]
[2019-04-04T10:37:48,519][INFO ][o.e.p.PluginsService     ] [] no plugins loaded
[2019-04-04T10:37:52,488][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [] [controller/20900] [Main.cc@109] controller (64 bit): Version 6.7.0 (Build d74ae2ac01b10d) Copyright (c) 2019 Elasticsearch BV
[2019-04-04T10:37:54,882][INFO ][o.e.d.DiscoveryModule    ] [] using discovery type [zen] and host providers [settings]
[2019-04-04T10:37:55,627][INFO ][o.e.n.Node               ] initialized
[2019-04-04T10:37:55,628][INFO ][o.e.n.Node               ] [] starting ...
[2019-04-04T10:37:55,790][INFO ][o.e.t.TransportService   ] [] publish_address {}, bound_addresses {}
[2019-04-04T10:37:56,445][INFO ][o.e.b.BootstrapChecks    ] [] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-04-04T10:37:59,643][INFO ][o.e.d.z.ZenDiscovery     ] [node3] failed to send join request to master [{node1}{KYovaukISHCpOp-d8eChRA}{HZbXCjfQRCyV75bFJMOj_w}{}{ml.machine_memory=135077466112, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[elastic-1][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[internal:discovery/zen/join/validate]]; nested: IOException[Invalid string; unexpected character: 255 hex: ff]; ]

node3 elasticsearch.yml file

cluster.name: clustername
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- node1
- node2
- node3
network.host: 0.0.0.0
node.data: false
node.ingest: false
node.name: node3
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
transport.host: _site_
xpack.security.enabled: false

can someone help me with this ASAP.

Thanks

shiv94 · April 4, 2019, 3:02pm

Node1 logs

[2019-04-04T11:00:25,482][WARN ][o.e.d.z.ZenDiscovery     ] [node1] failed to validate incoming join request from node [{node3}{odIrW6POTeS21bKLlRS7_Q}{xoUpgW5iRGGgHUHBdg4vFA}{10.10.11.90}{10.10.11.90:9300}{ml.machine_memory=25282514944, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]
org.elasticsearch.transport.RemoteTransportException: [node3][][internal:discovery/zen/join/validate]
Caused by: java.io.IOException: Invalid string; unexpected character: 255 hex: ff
        at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:402) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:38) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.<init>(AggregatorFactories.java:263) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:777) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.xpack.core.ml.datafeed.AggProvider.fromStream(AggProvider.java:79) ~[?:?]
        at org.elasticsearch.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:777) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.xpack.core.ml.datafeed.DatafeedConfig.<init>(DatafeedConfig.java:222) ~[?:?]
        at org.elasticsearch.xpack.core.ml.MlMetadata.<init>(MlMetadata.java:160) ~[?:?]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.cluster.metadata.MetaData.readFrom(MetaData.java:842) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:754) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.discovery.zen.MembershipAction$ValidateJoinRequest.readFrom(MembershipAction.java:177) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.Streamable.lambda$newWriteableReader$0(Streamable.java:51) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1042) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

DavidTurner · April 4, 2019, 3:38pm

I think this looks like the issue fixed by https://github.com/elastic/elasticsearch/pull/40610. Either wait for 6.7.1 before upgrading, or else it should work to upgrade your whole cluster to 6.6.2 first and then on to 6.7.0.

shiv94 · April 4, 2019, 3:58pm

So I should go back to 6.4.1 version in node3 and upgrade 6.4.1 to 6.6.2 first. I need to upgrade each node, upgrade node3, join the cluster and then node2?

DavidTurner · April 4, 2019, 4:00pm

6.7.1 is now released, so the simplest thing to do now is upgrade to that.

shiv94 · April 10, 2019, 3:05pm

I upgraded to 6.6.2, I tried upgrading the node1 but it's not upgrading. node1 and node2 are master, data and ingest nodes. node3 is only master node.

GET _cat/nodes?h=name,ip,version

node1x.x.x.x 6.4.1
node2 x.x.x.x  6.6.2
node3 x.x.x.x 6.6.2

GET /_cluster/allocation/explain

{
  "index": "metricbeat-6.6.2-2019.03.21",
  "shard": 4,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2019-04-09T17:57:07.106Z",
    "details": "node_left[KYovaukISHCpOp-d8eChRA]",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "KYovaukISHCpOp-d8eChRA",
      "node_name": "node1",
      "transport_address": "",
      "node_attributes": {
        "ml.machine_memory": "135077466112",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "node_version",
          "decision": "NO",
          "explanation": "cannot allocate replica shard to a node with version [6.4.1] since this is older than the primary version [6.6.2]"
        }
      ]
    },
    {
      "node_id": "x-neLgNPQRW0wrnnzg_w9w",
      "node_name": "node2",
      "transport_address": "",
      "node_attributes": {
        "ml.machine_memory": "135077466112",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[metricbeat-6.6.2-2019.03.21][4], node[x-neLgNPQRW0wrnnzg_w9w], [P], s[STARTED], a[id=-CAat8JtSFS6yQsNzf9afg]]"
        }
      ]
    }
  ]
}

replica shards are not assigned to node1 becuase its 6.4.1.

How to upgrade?

DavidTurner · April 10, 2019, 3:20pm

This is normal, I think - see step 9 of the rolling upgrade documentation:

During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas assigned to a node with the old version. The new version might have a different data format that is not understood by the old version.

If it is not possible to assign the replica shards to another node (there is only one upgraded node in the cluster), the replica shards remain unassigned and status stays yellow .

In this case, you can proceed once there are no initializing or relocating shards (check the init and relo columns).
As soon as another node is upgraded, the replicas can be assigned and the status will change to green .

shiv94 · April 10, 2019, 3:46pm

I am upgrading through puppet, I changed to 6.6.2 and after running puppet agent in node1

i checked the version in the server

apt-cache policy elasticsearch

elasticsearch:
  Installed: 6.6.2
  Candidate: 5.6.0

but when I check through cluster API

GET /_nodes/node1

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "",
  "nodes": {
    "KYovaukISHCpOp-d8eChRA": {
      "name": "node1",
      "transport_address": "",
      "host": "",
      "ip": "",
      "version": "6.4.1",
      "build_flavor": "default",
      "build_type": "deb",
      "build_hash": "e36acdb",
      "total_indexing_buffer": 6722702540,
      "roles": [
        "master",
        "data",
        "ingest"
      ],

DavidTurner · April 10, 2019, 3:50pm

Sorry, I don't know much about Puppet. The situation you describe sounds ok from Elasticsearch's point of view, but maybe your Puppet configuration doesn't handle this case correctly?

shiv94 · April 10, 2019, 3:52pm

Ok thanks for your help David, Will look into it.

I am confused I upgraded other two nodes through puppet but for node1 its not working.

shiv94 · April 10, 2019, 4:53pm

When I manually restarted the service again, then it changed to the 6.6.2 version and allocated all unassigned shards.

system · May 8, 2019, 4:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upgraded node not rejoining non upgraded cluster 6.5.1 to 6.7 upgrade - IOException[Invalid string; unexpected character: 255 hex: ff] Elasticsearch	27	1457	April 29, 2019
Upgrade cluster from 0.17.8 to 0.18.1 nuwer node unable to join Elasticsearch	3	301	July 6, 2017
Rolling Upgrade from 6.8.0 to 7.3.1 Failed to Join the Cluster Elasticsearch	5	682	October 7, 2019
Unable to establish connection to master when upgrading to 6.7.0 Elasticsearch	4	1173	April 30, 2019
1.4.0 data node can't join existing 1.3.4 cluster Elasticsearch	29	778	July 6, 2017

Elasticsearch upgrade from 6.4.1 to 6.7.0, upgraded node is unable to join the cluster

Related topics