Docker swarm and rolling restarts

I'm currently investigating using elasticsearch with docker swarm and docker service. One headache is how to orchestrate rolling restarts with this. Specifically, I want the cluster to stay green throughout the process with zero downtime. Typical rolling restarts would be needed for config changes, elasticsearch updates, taking into use new hw, or cluster trouble (e.g. ooms with logstsash/kibana).

It appears docker swarm has support for updating service containers one by one and waiting a specified amount of time in between. This is not ideal since I've seen node restarts take ages on es clusters with a lot of logstash data. So, there's a big risk of either not waiting long enough (red cluster, data loss) or too long (wasted time). Also, if there is a problem with one of the nodes, there needs to be a way to go in and fix things. Finally, I'd prefer something that is automated. I've been babysitting cluster updates manually and it's a pain in the ass every time plus I don't trust myself to do it right every time.

I'd love to hear from people who have figured out a way to do this or any best practices around this.

Did you try docker swarmkit?

With the info of this issue

https://github.com/docker/swarmkit/issues/1085

I conclude a Dockerfile HEALTHCHECK command

could be a curl/wget/whatever command to retrieve the node recovery status. I haven't tried for myself - just reading the docs.

To be honest, I expect something like an official Elasticsearch docker image that makes best use of the swarmkit features.

Perfect timing:

Thanks, that looks like it partly covers what I need. I'll be playing with this in the next few months probably.

I spend some more time on this and on the off chance that somebody trying the same ends up here, I wrote up an article on how to run Elasticsearch in Docker swarm 1.12.