I'm currently investigating using elasticsearch with docker swarm and docker service. One headache is how to orchestrate rolling restarts with this. Specifically, I want the cluster to stay green throughout the process with zero downtime. Typical rolling restarts would be needed for config changes, elasticsearch updates, taking into use new hw, or cluster trouble (e.g. ooms with logstsash/kibana).
It appears docker swarm has support for updating service containers one by one and waiting a specified amount of time in between. This is not ideal since I've seen node restarts take ages on es clusters with a lot of logstash data. So, there's a big risk of either not waiting long enough (red cluster, data loss) or too long (wasted time). Also, if there is a problem with one of the nodes, there needs to be a way to go in and fix things. Finally, I'd prefer something that is automated. I've been babysitting cluster updates manually and it's a pain in the ass every time plus I don't trust myself to do it right every time.
I'd love to hear from people who have figured out a way to do this or any best practices around this.