Cluster red state after 1.4 to 1.7 update?

jeffevans · October 26, 2015, 8:10pm

Hi,

We are preparing to update our server ES version from 1.4.4 to 1.7.3. We had an understanding in talking to various people that this should be relatively straightforward and low risk, if accomplished via a cluster rolling restart. We did the update in our (much smaller) testing environment, and the cluster state went red for several minutes, which gives us pause in preparing to do the same in production.

Here are the relevant log lines from the first node, in testing, that was restarted, after which time it went into a red state for around 15 minutes.

http://pastebin.com/mQycSRAB

The worrying bit to me is the ElasticsearchIllegalStateException. Our production cluster has 3923 total shards running on 20 nodes, with 85 TB of data. We had been planning to accomplish the update by a rolling restart of the cluster. But I wanted to make sure we weren't setting ourselves up for a long downtime while this process happens. Any insight is appreciated.

niraj_kumar · October 26, 2015, 9:39pm

Hi jeffevans,

How many nodes did you have in the test cluster. And i hope you have read the upgrade notes on upgrading from 1.4 to 1.7
https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

This upgrade updates one node at a time and thus avoid downtime.

Regards
Niraj

jeffevans · October 26, 2015, 9:57pm

Niraj,

There are 3 nodes. I checked with our system admin, and the exact rolling restart procedures you linked to were followed, except that the disabling of shard reallocation was not done (and also, the cluster did not attempt to reallocate any shards during the restart). So here was the exact sequence.

Node 1 restart
Cluster state goes to red for a while
Node 2 restart
Cluster state turned to green
Node 3 restart

Here is the full log output from all 3 nodes during the rolling restart. The time of concern is from 11:46 until 11:54 when Node 1 ("stage-es1") failed to join the cluster, which is I believe why it was in red state.

http://pastebin.com/kWjXVM3F

niraj_kumar · October 27, 2015, 12:03am

Seems the process you followed is good enough for the upgrade , just i had the shards disabled in my case. Also as a past experience i have seen that if you are on AWS and use the cloud-aws plugin , the cluster join is much faster as compared to zen. The failure of contacting can be the network issue as well, you can go with the production upgrade i feel.

--Niraj

Topic		Replies	Views
ES cluster is red after restart Elasticsearch	2	491	July 6, 2017
Cluster State Red after node restart Elasticsearch	2	343	October 7, 2019
Mysterious "red" cluster status has happened ~4x now Elasticsearch	1	301	July 6, 2017
Cluster turns to red after reboot Elasticsearch	29	2768	January 4, 2019
Restarting a cluster with existing data - Status Red? Elasticsearch	10	1262	July 6, 2017

Cluster red state after 1.4 to 1.7 update?

Related topics