Allocators lost connection to ZooKeeper

Lafunamor · November 21, 2017, 9:11am

Hi Guys

After increasing the RAM of all allocator nodes (this caused a sequential restart) they are no longer able to connect to ZooKeeper.
Here are a couple of logs I found:
services-forwarder.log

[2017-11-21 09:02:20,780][INFO ][org.apache.zookeeper.ClientCnxn] Opening socket connection to server 0.0.0.0/0.0.0.0:2181. Will not attempt to authenticate using SASL (unknown error) {}
[2017-11-21 09:02:20,780][WARN ][org.apache.zookeeper.ClientCnxn] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect {}
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)

allocator.log: https://gist.github.com/Lafunamor/ed8db7e384a442ecc2cb233c2c1405bb

I don't know if it's important but this cluster was upgraded from 1.0.2 to 1.1.0.
Do you guys have any idea how I can recover from this state?

Yuri · November 21, 2017, 1:46pm

Hi,
Can you please check if container frc-client-forwarders-client-forwarder is running and if there are any errors? Also, I would be great if you can send logs from that host.

Lafunamor · November 21, 2017, 2:28pm

Hi @Yuri
Thank you for your answer.
The client forwarder is up and running on the allocator and here is the client-forwarder log from the allocator node:
https://gist.github.com/Lafunamor/0ab7d0835c53b2d1a2ee93c11cc80891

Additionally here is the log from the client forwarder on a director node (container is also up and running):
https://gist.github.com/Lafunamor/7e184a6528db3beb10bf658887cf6f78
Let me know if you need any further information or log file.

Yuri · November 21, 2017, 2:52pm

Can you please do the following https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-help.html and send the archive to me via direct message?

Yuri · November 21, 2017, 3:06pm

Ah... sorry I probably misunderstood your problem! Please follow my comment here Maintenance Mode enabled after upgrade

Lafunamor · November 21, 2017, 3:23pm

Hey @Yuri
I already tried the solution you posted there. It did not help.
I'm sending you the archive now as a PM.

Lafunamor · November 23, 2017, 8:07am

I was able to resolve the problem thanks to a lot of help of @Yuri.
Thank you very much.

The issue was that my automation tried to perform the version upgrade multiple times and during a rollback of such an update some configuration got deleted.

system · December 7, 2017, 8:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recover runner that lost it's data Elastic Cloud Enterprise (ECE)	10	2021	December 19, 2017
ECE Platform upgrade from 2.13.2 to 3.3.0 failed and stuck during rollback Elastic Cloud Enterprise (ECE)	3	602	January 2, 2023
ECE Self-Hosted Network Diagram Questions Elastic Cloud Enterprise (ECE) docker	5	607	September 23, 2021
504 error issue Elasticsearch	2	3527	July 6, 2017
EC2 + ZooKeeper Disco: Tips on Simulating Cluster Failures Elasticsearch	2	396	July 6, 2017

Allocators lost connection to ZooKeeper

Related topics