Elasticsearch upgrade from 6.4.1 to 6.7.0, upgraded node is unable to join the cluster

Hello,
elasticserach, logstash, kibana - 6.4.1 version

Rolling upgrade

I have 3 nodes in the ELK cluster. I upgraded elasticsearch in node 3 from 6.4.1 to 6.7.0, then I get the following errors.

In the logs,

[2019-04-04T10:37:45,617][WARN ][o.e.c.l.LogConfigurator  ] Some logging configurations have %marker but don't have %node_name. We will automatically add %node_name to the pattern to ease the migration for users who customize log4j2.properties but will stop this behavior in 7.0. You should manually replace `%node_name` with `[%node_name]%marker ` in these locations:
 [2019-04-04T10:37:45,905][INFO ][o.e.e.NodeEnvironment    ]] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [10.4gb], net total_space [30.3gb], types [ext4]
[2019-04-04T10:37:45,906][INFO ][o.e.e.NodeEnvironment    ]] heap size [11.9gb], compressed ordinary object pointers [true]
[2019-04-04T10:37:46,388][INFO ][o.e.n.Node               ] node name [node3], node ID [odIrW6POTeS21bKLlRS7_Q]
[2019-04-04T10:37:46,389][INFO ][o.e.n.Node               ] version[6.7.0], pid[20803], build[default/deb/8453f77/2019-03-21T15:32:29.844721Z], OS[Linux/4.4.0-130-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b11]
[2019-04-04T10:37:46,389][INFO ][o.e.n.Node               ] JVM arguments [-Dfile.encoding=UTF-8, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djna.nosys=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+AlwaysPreTouch, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCApplicationStoppedTime, -XX:+PrintGCDateStamps, -XX:+PrintGCDetails, -XX:+PrintTenuringDistribution, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+UseConcMarkSweepGC, -XX:+UseGCLogFileRotation, -XX:-OmitStackTraceInFastThrow, -XX:CMSInitiatingOccupancyFraction=75, -XX:GCLogFileSize=64m, -XX:NumberOfGCLogFiles=32, -Xloggc:/var/log/elasticsearchgc.log, -Xms12g, -Xmx12g, -Xss1m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb]
[2019-04-04T10:37:48,516][INFO ][o.e.p.PluginsService     ] loaded module [aggs-matrix-stats]
[2019-04-04T10:37:48,516][INFO ][o.e.p.PluginsService     ] loaded module [analysis-common]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-common]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-geoip]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] loaded module [ingest-user-agent]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-expression]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-mustache]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [lang-painless]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [mapper-extras]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [parent-join]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [percolator]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [rank-eval]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [reindex]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [repository-url]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [transport-netty4]
[2019-04-04T10:37:48,517][INFO ][o.e.p.PluginsService     ] [] loaded module [tribe]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ccr]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-core]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-deprecation]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-graph]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ilm]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] loaded module [x-pack-logstash]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-ml]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-monitoring]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-rollup]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-security]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-sql]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [loaded module [x-pack-upgrade]
[2019-04-04T10:37:48,518][INFO ][o.e.p.PluginsService     ] [] loaded module [x-pack-watcher]
[2019-04-04T10:37:48,519][INFO ][o.e.p.PluginsService     ] [] no plugins loaded
[2019-04-04T10:37:52,488][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [] [controller/20900] [Main.cc@109] controller (64 bit): Version 6.7.0 (Build d74ae2ac01b10d) Copyright (c) 2019 Elasticsearch BV
[2019-04-04T10:37:54,882][INFO ][o.e.d.DiscoveryModule    ] [] using discovery type [zen] and host providers [settings]
[2019-04-04T10:37:55,627][INFO ][o.e.n.Node               ] initialized
[2019-04-04T10:37:55,628][INFO ][o.e.n.Node               ] [] starting ...
[2019-04-04T10:37:55,790][INFO ][o.e.t.TransportService   ] [] publish_address {}, bound_addresses {}
[2019-04-04T10:37:56,445][INFO ][o.e.b.BootstrapChecks    ] [] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-04-04T10:37:59,643][INFO ][o.e.d.z.ZenDiscovery     ] [node3] failed to send join request to master [{node1}{KYovaukISHCpOp-d8eChRA}{HZbXCjfQRCyV75bFJMOj_w}{}{ml.machine_memory=135077466112, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[elastic-1][][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[internal:discovery/zen/join/validate]]; nested: IOException[Invalid string; unexpected character: 255 hex: ff]; ]

node3 elasticsearch.yml file

cluster.name: clustername
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- node1
- node2
- node3
network.host: 0.0.0.0
node.data: false
node.ingest: false
node.name: node3
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
transport.host: _site_
xpack.security.enabled: false

can someone help me with this ASAP.

Thanks

Node1 logs

[2019-04-04T11:00:25,482][WARN ][o.e.d.z.ZenDiscovery     ] [node1] failed to validate incoming join request from node [{node3}{odIrW6POTeS21bKLlRS7_Q}{xoUpgW5iRGGgHUHBdg4vFA}{10.10.11.90}{10.10.11.90:9300}{ml.machine_memory=25282514944, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]
org.elasticsearch.transport.RemoteTransportException: [node3][][internal:discovery/zen/join/validate]
Caused by: java.io.IOException: Invalid string; unexpected character: 255 hex: ff
        at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:402) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:38) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.<init>(AggregatorFactories.java:263) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:777) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.xpack.core.ml.datafeed.AggProvider.fromStream(AggProvider.java:79) ~[?:?]
        at org.elasticsearch.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:777) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.xpack.core.ml.datafeed.DatafeedConfig.<init>(DatafeedConfig.java:222) ~[?:?]
        at org.elasticsearch.xpack.core.ml.MlMetadata.<init>(MlMetadata.java:160) ~[?:?]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.cluster.metadata.MetaData.readFrom(MetaData.java:842) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:754) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.discovery.zen.MembershipAction$ValidateJoinRequest.readFrom(MembershipAction.java:177) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.common.io.stream.Streamable.lambda$newWriteableReader$0(Streamable.java:51) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:56) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1042) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) ~[elasticsearch-6.4.1.jar:6.4.1]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

I think this looks like the issue fixed by https://github.com/elastic/elasticsearch/pull/40610. Either wait for 6.7.1 before upgrading, or else it should work to upgrade your whole cluster to 6.6.2 first and then on to 6.7.0.

So I should go back to 6.4.1 version in node3 and upgrade 6.4.1 to 6.6.2 first. I need to upgrade each node, upgrade node3, join the cluster and then node2?

6.7.1 is now released, so the simplest thing to do now is upgrade to that.

I upgraded to 6.6.2, I tried upgrading the node1 but it's not upgrading. node1 and node2 are master, data and ingest nodes. node3 is only master node.

GET _cat/nodes?h=name,ip,version

node1x.x.x.x 6.4.1
node2 x.x.x.x  6.6.2
node3 x.x.x.x 6.6.2

GET /_cluster/allocation/explain

{
  "index": "metricbeat-6.6.2-2019.03.21",
  "shard": 4,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2019-04-09T17:57:07.106Z",
    "details": "node_left[KYovaukISHCpOp-d8eChRA]",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "KYovaukISHCpOp-d8eChRA",
      "node_name": "node1",
      "transport_address": "",
      "node_attributes": {
        "ml.machine_memory": "135077466112",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "node_version",
          "decision": "NO",
          "explanation": "cannot allocate replica shard to a node with version [6.4.1] since this is older than the primary version [6.6.2]"
        }
      ]
    },
    {
      "node_id": "x-neLgNPQRW0wrnnzg_w9w",
      "node_name": "node2",
      "transport_address": "",
      "node_attributes": {
        "ml.machine_memory": "135077466112",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "ml.enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[metricbeat-6.6.2-2019.03.21][4], node[x-neLgNPQRW0wrnnzg_w9w], [P], s[STARTED], a[id=-CAat8JtSFS6yQsNzf9afg]]"
        }
      ]
    }
  ]
}

replica shards are not assigned to node1 becuase its 6.4.1.

How to upgrade?

This is normal, I think - see step 9 of the rolling upgrade documentation:

During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas assigned to a node with the old version. The new version might have a different data format that is not understood by the old version.

If it is not possible to assign the replica shards to another node (there is only one upgraded node in the cluster), the replica shards remain unassigned and status stays yellow .

In this case, you can proceed once there are no initializing or relocating shards (check the init and relo columns).
As soon as another node is upgraded, the replicas can be assigned and the status will change to green .

I am upgrading through puppet, I changed to 6.6.2 and after running puppet agent in node1

i checked the version in the server

apt-cache policy elasticsearch

elasticsearch:
  Installed: 6.6.2
  Candidate: 5.6.0

but when I check through cluster API

GET /_nodes/node1

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "",
  "nodes": {
    "KYovaukISHCpOp-d8eChRA": {
      "name": "node1",
      "transport_address": "",
      "host": "",
      "ip": "",
      "version": "6.4.1",
      "build_flavor": "default",
      "build_type": "deb",
      "build_hash": "e36acdb",
      "total_indexing_buffer": 6722702540,
      "roles": [
        "master",
        "data",
        "ingest"
      ],

Sorry, I don't know much about Puppet. The situation you describe sounds ok from Elasticsearch's point of view, but maybe your Puppet configuration doesn't handle this case correctly?

Ok thanks for your help David, Will look into it.

I am confused I upgraded other two nodes through puppet but for node1 its not working.

When I manually restarted the service again, then it changed to the 6.6.2 version and allocated all unassigned shards.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.