Security cluster cannot restart

afuggetta · May 23, 2019, 6:40pm

Hello,

I experienced some issues with my machine and had to restart docker. Somehow the docker /var/run/docker.sock file becomes a folder and ECE stops working.

The security cluster showed not healthy and I tried to restart it. But it always hangs without completing.

Is there a way to force the security cluster to stop or one of the nodes to get out of maintenance mode? (Yellow icon)

Thanks

Alex_Piggott · May 23, 2019, 7:16pm

At what point does the restart hang? You might need to look at the boot or ES logs in /mnt/data/elastic/:allocator_id/services/allocator/containers/elasticsearch/:cluster_id/:instance_id (or possibly the allocator logs in /mnt/data/elastic/:allocator_id/services/allocator/logs if docker is still unhappy after the restart

afuggetta · May 23, 2019, 8:37pm

Hey @Alex_Piggott this is what I see in the ES logs:

[2019-05-23T20:31:47,595][INFO ][org.elasticsearch.discovery.zen.ZenDiscovery] [instance-0000000001] failed to send join request to master [{instance-0000000000}{2BzjWsbyShKqjTNsxa4iIA}{CiIN-xGLRnuJmtX69I4uSQ}{xxMachine_IPxx}{xxMachine_IPxx:19296}{logical_availability_zone=zone-0, server_name=instance-0000000000.112164b932f243c1a83c265d348b0708, availability_zone=zone-3, xpack.installed=true, region=unknown-region, instance_configuration=data.default}], reason [RemoteTransportException[[instance-0000000000][172.17.0.13:19296][internal:discovery/zen/join]]; nested: ConnectTransportException[[instance-0000000001][xxAdmin_IPxx:19486] connect_exception]; nested: IOException[Connection refused: xxAdmin_IPxx/xxAdmin_IPxx:19486]; nested: IOException[Connection refused]; ]

And this when I try to restart:

[2019-05-23T20:34:24,285][WARN ][org.apache.zookeeper.ClientCnxn] [instance-0000000001] Session 0xa0000c1a83e0030 for server containerhost/xxAdmin_IPxx:22192, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:?]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:73) ~[zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) [zookeeper-3.5.1-alpha.jar:3.5.1-alpha-1693007]

Alex_Piggott · May 23, 2019, 9:30pm

Can you hit the ports listed for instance-0...0 from the host running instance-0...1 (you can see them by doing docker ps | grep <clusterid>?

The errors make it look like the docker port mappings are all messed up following the issue you mentioned. Did you reboot enough hosts? Or maybe you need to do something to recreate/reset the perms on the file that got corrupted?

system · June 6, 2019, 9:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restart cluster not working. ID "644057" Elastic Cloud Enterprise (ECE)	3	885	July 11, 2017
Elastic docker restart Elasticsearch docker	4	989	November 7, 2022
Troubleshooting ECE fail after restart Elastic Cloud Enterprise (ECE)	4	1603	March 12, 2018
Waiting for nodes to stop - timeout Elastic Cloud Enterprise (ECE)	5	1920	May 15, 2017
Elasticsearch can't restart and join the cluster Elasticsearch elastic-stack-security	4	735	December 7, 2021

Security cluster cannot restart

Related topics