Security cluster cannot restart


I experienced some issues with my machine and had to restart docker. Somehow the docker /var/run/docker.sock file becomes a folder and ECE stops working.

The security cluster showed not healthy and I tried to restart it. But it always hangs without completing.

Is there a way to force the security cluster to stop or one of the nodes to get out of maintenance mode? (Yellow icon)


At what point does the restart hang? You might need to look at the boot or ES logs in /mnt/data/elastic/:allocator_id/services/allocator/containers/elasticsearch/:cluster_id/:instance_id (or possibly the allocator logs in /mnt/data/elastic/:allocator_id/services/allocator/logs if docker is still unhappy after the restart

Hey @Alex_Piggott this is what I see in the ES logs:

[2019-05-23T20:31:47,595][INFO ][org.elasticsearch.discovery.zen.ZenDiscovery] [instance-0000000001] failed to send join request to master [{instance-0000000000}{2BzjWsbyShKqjTNsxa4iIA}{CiIN-xGLRnuJmtX69I4uSQ}{xxMachine_IPxx}{xxMachine_IPxx:19296}{logical_availability_zone=zone-0, server_name=instance-0000000000.112164b932f243c1a83c265d348b0708, availability_zone=zone-3, xpack.installed=true, region=unknown-region, instance_configuration=data.default}], reason [RemoteTransportException[[instance-0000000000][][internal:discovery/zen/join]]; nested: ConnectTransportException[[instance-0000000001][xxAdmin_IPxx:19486] connect_exception]; nested: IOException[Connection refused: xxAdmin_IPxx/xxAdmin_IPxx:19486]; nested: IOException[Connection refused]; ]

And this when I try to restart:

[2019-05-23T20:34:24,285][WARN ][org.apache.zookeeper.ClientCnxn] [instance-0000000001] Session 0xa0000c1a83e0030 for server containerhost/xxAdmin_IPxx:22192, unexpected error, closing socket connection and attempting reconnect Connection reset by peer
	at Method) ~[?:?]
	at ~[?:?]
	at ~[?:?]
	at ~[?:?]
	at ~[?:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO( ~[zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( ~[zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]
	at org.apache.zookeeper.ClientCnxn$ [zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]

Can you hit the ports listed for instance-0...0 from the host running instance-0...1 (you can see them by doing docker ps | grep <clusterid>?

The errors make it look like the docker port mappings are all messed up following the issue you mentioned. Did you reboot enough hosts? Or maybe you need to do something to recreate/reset the perms on the file that got corrupted?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.