Rolling restarts

Are there recommended procedures for rolling restarts of ECE runners in cases of system level patching or hardware maintenance?

Something like:

  • Zone by zone of course
  • Do the control plan first of all, zookeeper first
    • Before taking down a ZK node, check that you have exactly 3 or 5 containers and they are all connected (eg there is a ZK status element under settings)
  • If you have sufficient allocator space, it is recommended that you put each allocator to patch in maintenance mode first, then migrate clusters off it, then take it down etc
    • (of course in many cases there is insufficient space to do this, in which case - ensure all non-HA clusters are migrated off and be aware that you will lose HA during the rolling change)

Also: docker 1.11 at least has a broken daemon restart, so the recommended way of bringing a host up without running ECE is as follows:

  • Disable the docker daemon (don't stop it) - exact command depends on OS
  • Reboot the host to bring it up without ECE
  • Perform whatever changes are required
  • Re-enable the docker daemon once done
  • Reboot the host to bring it back with ECE running

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.