Docker Updates Causing Issues with the ECE

I noticed connection issues with the ECE after Google updated docker to its latest version "Docker version 19.03.2, build", and right after that everything in the ECE has stopped working. I can still log into the UI, but nothing else is showing up. I'm getting a few errors while clicking on things from the left side menu, nothing is populating though:

-> There was a problem communicating with the system cluster. Origin status code [502 Bad Gateway].
-> Fetching region ece-region failed

Looks like (https://www.elastic.co/guide/en/cloud-enterprise/current/ece-prereqs-software.html) we don't yet support Docker 19.x - I'd contact support to see what can be done to rollback (it may not be pleasant)

We rolled back the recent docker updates. Things went back to normal. Any idea when you'll start supporting Docker 19.x?

Glad the rollback wasn't too bad! I'm following up on docker 19.x plans, but I don't believe work has started yet, so it will be >1 minor at least.

Alex

1 Like

Although Docker 19.x is not supported yet, it's worth mentioning that Docker daemon upgrade will require performing maintenance 'by deleting runner' on host: https://www.elastic.co/guide/en/cloud-enterprise/current/ece-perform-host-maintenance.html#ece-perform-host-maintenance-delete-runner

@Przemyslaw_H or @Alex_Piggott , we've ran the ECE on Google managed systems (in GCP Cloud), and Google kept updating docker and other OS related libraries. We just opened a case with Google to stop pushing docker updates. I've also noticed, we are currently running Docker version 19.03.2, build 6a30dfc since Google updated to the latests version again (even after we recently rolled back the updates and excluded docker updates in yum.conf).

The ECE is working fine, even thought it was updated from version 18.x to 19.x (sometime ago), and recently re-updated from 19.02 to 19.03 (Sept 6), and remember, we didn't even do the host maintenance update (nondestructive nor destructive), so no idea why it didn't break.

NOTE: It broke for the first time when Google updated docker from 19.02 to 19.03 (Sept 04), we then rolled the updates back to 19.02, two days later Google pushed the same updates again (Sep 6), and we are on 19.03 version today. Things are fine, the ECE is working fine. That was the only one time it broke, even though, Google pushed the same updates again, it didn't break the ECE again (weird).

How do we fix this mess now? Should we stay on 19.x or should we roll back to 18.x version? And if yes, roll back to 18.x, what's the best approach of doing it now?

--Thanks