Upgrade ECE from 2.4.3 to 2.6.1

Hi ,

i have some trouble with the ECE upgrade.
Some facts:

  • 6 VM´s
  • Fresh cluster install with version 2.4.1
  • Start of some deployments from version 6.8.0 to 7.9.1
  • Start upgrade

Upgrade logs

  • Running Upgrade Initial container
  • Monitoring upgrade process
  • Obtaining elevated permissions
  • Elevated permissions were obtained successfully
  • Checking to see if pending plans exist before upgrade...
  • Checking if all systems clusters are at version 6.8.0 or higher
  • Starting upgrade of Elastic Cloud Enterprise [2.4.3] to [2.6.1]
  • Backing up ZooKeeper's transaction log to [/project/ece-cluster/elastic/192.3.57.2/services/zookeeper/data/backup/20200909-131818/version-2]
  • Initializing upgrade status
  • Creating backup of containers' configuration
  • Ignore the platform's registry because a custom registry is specified [docker.elastic.co].
  • Container [upgraders-upgrader] has been scheduled successfully on all ECE runners [192.3.57.2,192.3.57.5,192.3.56.255,192.3.57.0,192.3.57.1,192.3.56.254]
  • Waiting for upgraders to be started on every host
  • An upgrade service is on-line on runner [192.3.57.1]
  • An upgrade service is on-line on runner [192.3.56.255]
  • An upgrade service is on-line on runner [192.3.56.254]
  • An upgrade service is on-line on runner [192.3.57.5]
  • An upgrade service is on-line on runner [192.3.57.0]
  • An upgrade service is on-line on runner [192.3.57.2]
  • All upgraders are on-line. Monitoring the upgrade process
  • [192.3.57.2]: step [before-all] status [upgrade started] at [2020-09-09T13:18:33.239Z]
  • [192.3.57.2]: step [before-all] status [upgrade successful] at [2020-09-09T13:18:34.067Z]
  • [192.3.57.2]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:18:39.082Z]
  • [192.3.57.2]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:18:44.497Z]
  • [192.3.56.254]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:18:48.098Z]
  • [192.3.56.254]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:18:53.330Z]
  • [192.3.56.255]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:18:57.455Z]
  • [192.3.56.255]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:19:02.644Z]
  • [192.3.57.0]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:19:03.221Z]
  • [192.3.57.0]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:19:08.433Z]
  • [192.3.57.1]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:19:11.800Z]
  • [192.3.57.1]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:19:16.940Z]
  • [192.3.57.5]: step [Container(runners-runner)] status [upgrade started] at [2020-09-09T13:19:18.191Z]
  • [192.3.57.5]: step [Container(runners-runner)] status [upgrade successful] at [2020-09-09T13:19:23.394Z]
  • [192.3.57.2]: step [Container(directors-director)] status [upgrade started] at [2020-09-09T13:19:24.523Z]
  • [192.3.57.2]: step [Container(directors-director)] status [upgrade successful] at [2020-09-09T13:19:59.555Z]
  • [192.3.57.1]: step [Container(directors-director)] status [upgrade started] at [2020-09-09T13:20:01.954Z]
  • [192.3.57.1]: step [Container(directors-director)] status [upgrade failed] at [2020-09-09T13:20:01.981Z]. Message: [Container [directors-director] doesn't exist and feature flag is not configured).]
  • [192.3.57.2]: step [Container(directors-director)] status [rollback started] at [2020-09-09T13:20:05.175Z]
  • [192.3.57.2]: step [Container(directors-director)] status [rollback successful] at [2020-09-09T13:20:05.768Z]
  • [192.3.57.1]: step [Container(directors-director)] status [rollback started] at [2020-09-09T13:20:07.575Z]
  • [192.3.57.1]: step [Container(directors-director)] status [rollback successful] at [2020-09-09T13:20:07.600Z]
  • [192.3.57.2]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:10.780Z]
  • [192.3.57.2]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:11.098Z]
  • [192.3.57.5]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:14.059Z]
  • [192.3.57.5]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:14.407Z]
  • [192.3.57.1]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:17.612Z]
  • [192.3.57.1]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:17.907Z]
  • [192.3.57.0]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:19.098Z]
  • [192.3.57.0]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:19.443Z]
  • [192.3.56.255]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:23.310Z]
  • [192.3.56.255]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:23.640Z]
  • [192.3.56.254]: step [Container(runners-runner)] status [rollback started] at [2020-09-09T13:20:24.010Z]
  • [192.3.56.254]: step [Container(runners-runner)] status [rollback successful] at [2020-09-09T13:20:24.369Z]
  • [192.3.57.2]: step [before-all] status [rollback started] at [2020-09-09T13:20:26.114Z]
  • [192.3.57.2]: step [before-all] status [rollback successful] at [2020-09-09T13:20:26.749Z]

On that machine with error is a container with name frc-directors-director ...

After disable and enable the director role on that machine the upgrade was successful.

Hi @ece_master

Welcome to the community.

Since you are running ECE that means you have a commercial license, which also means you have a support subscription. These kind of issues are exactly what support can help with, you should file a support ticket.

We are currently evaluating to determine if ece is the right one for us .

Understood. Even during ECE evaluation you could get help from a Solution Architect if you wanted.

You can PM me directly if you are interested and I can put you in contact with the right person, if not that is OK as well just keep in mind this forum is maintained mostly by volunteers.

Looks like this is the failing here

  • [192.3.57.1]: step [Container(directors-director)] status [upgrade failed] at [2020-09-09T13:20:01.981Z]. Message: [Container [directors-director] doesn't exist and feature flag is not configured).]

First thing I notice from. The docs is the recommended upgrade path is to 2.4.3 to 2.5.1 to 2.6.1 that would be the first thing I would try.

https://www.elastic.co/guide/en/cloud-enterprise/current/ece-upgrade.html Upgrade your system deployments

For Elastic Cloud Enterprise 2.6.0, you must upgrade your system deployments to version 6.8 before proceeding. To ensure the smoothest upgrade process, we recommend first upgrading ECE to at least version 2.5.0 if it is not already.

Hi @ece_master,
Can you ssh to runner "192.3.57.1" and go to "//logs/upgrader-logs/upgrader.log" (or similar, I do not remember precise path)? Check for what happened during the director upgrade.

Usually, that are signs of an unhealthy runner. Because the container was scheduled but did not run. The fact that you found it later can be explained by restarting of the runner service that the upgrader performs during the rollback: i.e. when the runner was restarted, it re-read the configuration and spun up the missed director.

Stephen, yuri thanks for the advice. I'm going to reinstall the cluster and test the upgrade again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.