Cluster crash after upgrading Kibana to 6.0

LogBabel · November 28, 2017, 6:45pm

Hello all,

Our cluster was upgraded from 5.6.3 to 6.0.0 and has now crashed and will not recover. I followed the directions and used a rolling upgrade. It seems upgrading Kibana to 6.0.0 caused some sort of failure. The migration assistant failed to migrate the kibana index and I couldn't figure out why so I just deleted it; some time later the cluster began to crash. Below are some cluster stats and the sequence of upgrades:

8 nodes: 6 data nodes, 2 indexers, 3 master nodes. 64GB RAM, 31GB for min/max heap. x-pack installed with free license. 2 instances of Kibana, each running on an index node.

2017-11-25: Rolling upgrade of all nodes.
2017-11-27: Upgrade of Kibana nodes.

20:29 UTC: "[node1] failed to execute on node [_-9823d-SD...] (..) exception (..) [node2] node not connected"

2017-11-27: At 23:55 UTC node6 crashed and was removed from the cluster. The process was still running and would not terminate without "kill -9". On restart the node6 reports timeout discovering master despite all cluster services running.

At this point I suspected a problem with the Debian upgrade procedure so I shutdown the entire cluster and reinstalled Elasticsearch on all data nodes. Now the cluster is stuck at 15% shard recovery for over 12 hours.

Thanks in advance for the help.

LogBabel · November 28, 2017, 7:16pm

I ran the migration assistant prior to the upgrade and it only warned about issues with index compatibility such as use of the _all field. The 6.0 documentation says old indices are backwards compatible so I don't believe this has anything to do with the failure.

Unfortunately when I reinstalled Elasticsearch the log directory was deleted by the Debian package manager. I only have old logs from the two indexer/kibana nodes and new logs from the data nodes.

LogBabel · November 28, 2017, 7:24pm

Cluster discovery fails for some reason even though all nodes are reachable and ES is running.

node6:
failed to send join request to master [node3 10.1.1.3]... reason ... [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]

node3:

[2017-11-28T11:18:34,870][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [node3] failed to execute [indices:monitor/stats] on node [....lCtR12bI7...vWyeg]
org.elasticsearch.transport.NodeNotConnectedException: [node6][10.1.1.6:9300] Node not connected

index_node1:

[2017-11-28T11:17:40,580][INFO ][o.e.d.z.ZenDiscovery ] [index_node1] failed to send join request to master [{node3}{HIwh.....7.eZ...Vvlort-w}{gJLrV....aoq72q5vbGfA}{10.1.1.3}{10.1.1.3:9300}{rack=rackA6-1}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]

LogBabel · November 28, 2017, 7:34pm

Is a downgrade to 5.6.3 possible ? I'm OK with loosing recent data.. more concerned about operational status.

LogBabel · November 28, 2017, 10:39pm

Downgrading did not work. Our cluster is dead.

spinscale · November 29, 2017, 10:11am

Do you have full stack traces somewhere, or complete logs? Maybe that include starting of the node to see what happens next? What about the master node logs?

Does the cluster form at all or does it become unstable once you are using kibana with it? Or does it become unstable once recovery was started?

Have you checked the pending tasks? hot threads? node stats?

What exactly happened to node6? Did it respond to anything like a HTTP request before you killed it? Any log messages? Any dmesg output?

system · December 27, 2017, 10:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana doesn't start after upgrading from 6.6.0 to 6.8.10 version Kibana	4	550	August 5, 2020
Failed to Re-index before 5 -> 6 Upgrade, now what? Elasticsearch	3	597	January 9, 2018
Upgrade issue with Elastic Stack 5.6.0, workaround option until fix is available Kibana	11	8974	October 23, 2017
Upgrading elasticsearch from 5.6 to 6.7 error: The Upgrade API must be run for 6.x nodes to join the cluster Elasticsearch elastic-stack-security	3	245	August 16, 2022
6.2.4 to 6.3 upgrade broke kibana monitoring Elasticsearch	4	1258	July 12, 2018

Cluster crash after upgrading Kibana to 6.0

Related topics