Can't upgrade elasticsearch 5.2.1 to 5.6.5


(Arter Xu) #1

Old Version:
Elasticsearch version : 5.2.1
JVM version : 1.8.0_71
OS version : centos 7.3
New Verision:
Elasticsearch version : 5.6.5
JVM version : 1.8.0_131
OS version : centos 7.3

The details:
I have six elasitcsearch clusters of 20 nodes. Plan to upgrade 5.2.1 to 5.26 though the rolling upgrade , reference the https://www.elastic.co/guide/en/elasticsearch/reference/5.6/rolling-upgrades.html.
Five clusters had successfully upgraded ,but one of six elasticsearch clusters can not be upgraded. When one node of the problem cluster is to upgrade ,it cant't join the old cluster.
my english is poor,so sorry.

before upgrade:
curl localhost:9200
{
  "name" : "xg-ops-elk-javaes-mgt-3",
  "cluster_name" : "xg-ops-elk-javaes-cluster",
  "cluster_uuid" : "W_xR97SqQ66yHEq9bQUDZQ",
  "version" : {
    "number" : "5.2.1",
    "build_hash" : "db0d481",
    "build_date" : "2017-02-09T22:05:32.386Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.1"
  },
  "tagline" : "You Know, for Search"
}
After upgrade and restart:
curl localhost:9200
{
  "name" : "xg-ops-elk-javaes-mgt-3",
  "cluster_name" : "xg-ops-elk-javaes-cluster",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "5.6.5",
    "build_hash" : "6a37571",
    "build_date" : "2017-12-04T07:50:10.466Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}
the log is :
[2018-01-14T16:33:13,125][INFO ][o.e.n.Node               ] [xg-ops-elk-javaes-mgt-3] initialized
[2018-01-14T16:33:13,125][INFO ][o.e.n.Node               ] [xg-ops-elk-javaes-mgt-3] starting ...
[2018-01-14T16:33:13,265][INFO ][o.e.t.TransportService   ] [xg-ops-elk-javaes-mgt-3] publish_address {10.0.23.55:9300}, bound_addresses {127.0.0.1:9300}, {10.0.23.55:9300}
[2018-01-14T16:33:13,274][INFO ][o.e.b.BootstrapChecks    ] [xg-ops-elk-javaes-mgt-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-14T16:33:23,746][INFO ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:34,121][INFO ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:43,293][WARN ][o.e.n.Node               ] [xg-ops-elk-javaes-mgt-3] timed out while waiting for initial discovery state - timeout: 30s
[2018-01-14T16:33:43,303][INFO ][o.e.h.n.Netty4HttpServerTransport] [xg-ops-elk-javaes-mgt-3] publish_address {10.0.23.55:9200}, bound_addresses {127.0.0.1:9200}, {10.0.23.55:9200}
[2018-01-14T16:33:43,303][INFO ][o.e.n.Node               ] [xg-ops-elk-javaes-mgt-3] started
[2018-01-14T16:33:44,475][INFO ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:49,162][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
.....
[2018-01-14T16:40:59,129][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
[2018-01-14T16:41:08,687][INFO ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:41:09,134][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
[2018-01-14T16:41:09,142][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-01-14T16:41:09,142][WARN ][r.suppressed             ] path: /_cluster/health, params: {level=indices}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.5.jar:5.6.5]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

Please help me, i can't how to slove the issue now ,thanks.


(David Pilato) #2

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

Did you upgrade all nodes? What are the other logs?


(Arter Xu) #3

what info can i support for you more?


(David Pilato) #4

What I asked for:

Did you upgrade all nodes? What are the other logs?


(Arter Xu) #5

No ,i upgrade the cluster from rolling upgrade, so upgrade one by one ,the other logs have been updated today .
I mean that what other info you need for help me ? thanks


(David Pilato) #6

The logs of the current master node


(Arter Xu) #7

The logs of the current master node is below:

[2018-01-14T17:36:24,392][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
	at org.elasticsearch.cluster.node.DiscoveryNodeFilters.buildFromKeyValue(DiscoveryNodeFilters.java:73) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.IndexMetaData$Builder.build(IndexMetaData.java:1044) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.IndexMetaData.readFrom(IndexMetaData.java:724) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.metadata.MetaData.readFrom(MetaData.java:676) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.cluster.ClusterState.readFrom(ClusterState.java:659) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.discovery.zen.MembershipAction$ValidateJoinRequest.readFrom(MembershipAction.java:171) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1510) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1396) ~[elasticsearch-5.2.1.jar:5.2.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_73]
[2018-01-14T17:36:34,728][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException

(Arter Xu) #8

repeat the error ,but i can't unstand where the problem is .

[2018-01-14T17:41:55,105][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:05,469][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:15,797][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:26,198][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:36,521][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:46,872][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException
[2018-01-14T17:42:57,229][WARN ][o.e.d.z.ZenDiscovery     ] [xg-ops-elk-javaes-mgt-2] failed to validate incoming join request from node [{xg-ops-elk-javaes-mgt-3}{SQaSuQ1aS-izcNs4P9yItQ}{_Wc_mrnfS5Ghb3RjuJLiJg}{10.0.23.55}{10.0.23.55:9300}]
org.elasticsearch.transport.RemoteTransportException: [xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]
Caused by: java.lang.NullPointerException

(David Pilato) #9

What are your elasticsearch.yml settings?


(Arter Xu) #10

master node:

cluster.name: xg-ops-elk-javaes-cluster
node.name: xg-ops-elk-javaes-mgt-2
node.master: true
node.data: false
path.data: /data/es_data/
path.logs: /data/es_log/
path.conf: /opt/elasticsearch/config
bootstrap.memory_lock: true
network.host: ["10.0.19.68","127.0.0.1"]
transport.tcp.compress: true
transport.tcp.port: 9300
http.port: 9200
http.max_content_length: 100mb
discovery.zen.ping.unicast.hosts: [xg-ops-elk-javaes-mgt-1,xg-ops-elk-javaes-mgt-2,xg-ops-elk-javaes-mgt-3,'xg-ops-elk-javaes-1:9301', 'xg-ops-elk-javaes-2:9301', 'xg-ops-elk-javaes-3:9301', 'xg-ops-elk-javaes-4:9301', 'xg-ops-elk-javaes-5:9301', 'xg-ops-elk-javaes-6:9301', 'xg-ops-elk-javaes-7:9301', 'xg-ops-elk-javaes-8:9301', 'xg-ops-elk-javaes-9:9301', 'xg-ops-elk-javaes-10:9301', 'xg-ops-elk-javaes-11:9301', 'xg-ops-elk-javaes-12:9301', 'xg-ops-elk-javaes-13:9301', 'xg-ops-elk-javaes-14:9301', 'xg-ops-elk-javaes-15:9301', 'xg-ops-elk-javaes-16:9301', 'xg-ops-elk-javaes-17:9301', 'xg-ops-elk-javaes-18:9301', 'xg-ops-elk-javaes-19:9301', 'xg-ops-elk-javaes-20:9301']
#discovery.zen.ping.unicast.hosts: [xg-ops-elk-javaes-mgt-1,xg-ops-elk-javaes-mgt-2,xg-ops-elk-javaes-mgt-3]
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 60s
cluster.routing.allocation.same_shard.host: true

http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST
http.cors.allow-headers: X-Requested-With, Content-Type, Content-Length, X-Auth-Token

upgrade node:

cluster.name: xg-ops-elk-javaes-cluster
node.name: xg-ops-elk-javaes-mgt-3
node.master: true
node.data: false
path.data: /data/es_data/
path.logs: /data/es_log/
path.conf: /opt/elasticsearch/config
bootstrap.memory_lock: true
network.host: ["10.0.23.55","127.0.0.1"]
transport.tcp.compress: true
transport.tcp.port: 9300
http.port: 9200
http.max_content_length: 100mb
discovery.zen.ping.unicast.hosts: [xg-ops-elk-javaes-mgt-1,xg-ops-elk-javaes-mgt-2,xg-ops-elk-javaes-mgt-3]
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 10s
cluster.routing.allocation.same_shard.host: true
indices.fielddata.cache.size: 30%

http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST
http.cors.allow-headers: X-Requested-With, Content-Type, Content-Length, X-Auth-Token

(Arter Xu) #11

Other cluster had upgraded successfully which has the same config.
This cluster is strange


(David Pilato) #12

I'd put

discovery.zen.ping.unicast.hosts: [xg-ops-elk-javaes-mgt-1,xg-ops-elk-javaes-mgt-2,xg-ops-elk-javaes-mgt-3]

On all nodes, not only data nodes.

But that is probably not the issue.

I think that it's related to some index settings like index.routing.allocation.exclude.XXX. Are you using that in your index settings? If so, which one?


(Arter Xu) #13

Yes, i am using the index.routing.allocation.exclude.XXX in some index settings. Maybe it is the issue, i will check it and solve it .
If you had the method of to slove the issue, please tell me.
By the way, is it a bug?
Thank you very much! Good luck.


(Arter Xu) #14

Maybe the problem is related to the template settings.
Because not all nodes have the settings:

node.attr.zone: zone_one
node.attr.disk: ssd
Or 
node.attr.zone: zone_one
node.attr.disk: sata

,i select the include and exlude settings for index:
the template:

default:
{
  "order": 0,
  "template": "*",
  "settings": {
    "index": {
      "routing": {
        "allocation": {
          "exclude": {
            "disk": "sata"
          }
        }
      }...
other tempate:
{
  "order": 1,
  "template": "apm.store_tranport-*",
  "settings": {
    "index": {
      "routing": {
        "allocation": {
          "include": {
            "disk": "sata"
          },
          "exclude": {
            "disk": null
          }
        }
      }
...

I think i can add the node.attr.disk: ssd settings for the nodes needed, then restart the nodes, modify exlude to include which maybe solve the issue.


(David Pilato) #15

That'd be awesome to add the settings everywhere and confirm.

Yes to me it's a bug. But let's continue digging before opening the issue.

Thanks for the tests so far!


(Arter Xu) #16

Dear dadoonet:
Thanks for helping me.
I had found the root issue because of your support so far.
The blow is my describe:
Though my test, i prove the template is the root case. Like above default and apm.store_tranport-* template create the index settings is

 "include": {
            "disk": "sata"
          },
          "exclude": {
            "disk": null
          }

but not is

 "include": {
            "disk": "sata"
          }

the "disk": null should not appeare in the index settings.

Now i handle it temporary though the shell:

curl  127.0.0.1:9200/index -d '{
    "settings": {
        "index": {
            "routing": {
                "allocation": {
                    "exclude": {
                        "disk": null
                    }
                }
            }
        }
    }
}'

Then "disk": null disappeare from the index settings, naturally the upgraded node can join the cluster.
In a word, the upgraded node cannot join the cluster because for include or exclude.xxx: null, if include or exclude.xxx is not "null" ,it is nomal.
I would optimize my template settings.
I believe you can fix the issue after .

Thanks again for your help!


(David Pilato) #17

I opened this: