I upgraded the data nodes (data=true, master=false), and stood up two new master nodes (data=false, master=true). My final step was to bounce the old master node, with the hope a new master (with new version) was elected. It was elected, but it is raising many errors.
What do I do now?
(some stack trace lines removed because posting limit)
[2019-01-20T16:58:57,244][WARN ][o.e.g.G.InternalReplicaShardAllocator] [master1] [unittest20190113_000000][6]: failed to list shard for shard_store on node [HrTEqeTNRZW8OfSEJ3Y2DA]
org.elasticsearch.action.FailedNodeException: Failed node [HrTEqeTNRZW8OfSEJ3Y2DA]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:237) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:153) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:211) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1130) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.transport.TcpTransport.lambda$handleException$32(TcpTransport.java:1268) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:135) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1266) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1258) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1188) [elasticsearch-6.5.4.jar:6.5.4]
at
Caused by: org.elasticsearch.transport.RemoteTransportException: [spot_54.190.10.44][172.31.1.87:9300][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[unittest20190113_000000][6]]
Caused by: java.io.FileNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(MMapDirectory@/data2/nodes/0/indices/bbMLDFi5Qt2Z3anblhTX-Q/6/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@6789a88d)): files: [recovery.BCjzSmBfTdmezeG79jnG9Q._1ll.dii, recovery.BCjzSmBfTdmezeG79jnG9Q._1ll.dim, recovery.BCjzSmBfTdmezeG79jnG9Q.segments_68, write.lock]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:640) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:442) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:131) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:201) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.store.Store.access$200(Store.java:129) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:851) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:784) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.store.Store.getMetadata(Store.java:287) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.index.shard.IndexShard.snapshotStoreMetadata(IndexShard.java:1176) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:127) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:111) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:260) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:256) ~[elasticsearch-6.5.4.jar:6.5.4]
... 1 more
The other new master seems to be waiting for something:
[2019-01-20T17:08:12,465][INFO ][o.e.x.m.e.l.LocalExporter] [master2] waiting for elected master node [{master1}{68sPQmYrRdW60YXSGUeT2w}{1mXMJUiKQNSezNiFgOF9lQ}{172.31.1.13}{172.31.1.13:9300}{ml.machine_memory=2090577920, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true, zone=primary}] to setup local exporter [default_local] (does it have x-pack installed?)
The old master does not seem to be in good shape either:
[2019-01-20T16:55:05,335][INFO ][o.e.n.Node ] [coordinator6] started
[2019-01-20T16:55:07,687][ERROR][i.n.u.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.
[2019-01-20T16:59:08,667][WARN ][o.e.t.TransportService ] [coordinator6] Received response for a request that has timed out, sent [242977ms] ago, timed out [212976ms] ago, action [internal:discovery/zen/fd/master_ping], node [{master1}{68sPQmYrRdW60YXSGUeT2w}{1mXMJUiKQNSezNiFgOF9lQ}{172.31.1.13}{172.31.1.13:9300}{ml.machine_memory=2090577920, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true, zone=primary}], id [264]