Allocators lost connection to ZooKeeper

Hi Guys

After increasing the RAM of all allocator nodes (this caused a sequential restart) they are no longer able to connect to ZooKeeper.
Here are a couple of logs I found:

[2017-11-21 09:02:20,780][INFO ][org.apache.zookeeper.ClientCnxn] Opening socket connection to server Will not attempt to authenticate using SASL (unknown error) {}
[2017-11-21 09:02:20,780][WARN ][org.apache.zookeeper.ClientCnxn] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect {} Connection refused
        at Method)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
        at org.apache.zookeeper.ClientCnxn$


I don't know if it's important but this cluster was upgraded from 1.0.2 to 1.1.0.
Do you guys have any idea how I can recover from this state?

Can you please check if container frc-client-forwarders-client-forwarder is running and if there are any errors? Also, I would be great if you can send logs from that host.

Hi @Yuri
Thank you for your answer.
The client forwarder is up and running on the allocator and here is the client-forwarder log from the allocator node:

Additionally here is the log from the client forwarder on a director node (container is also up and running):
Let me know if you need any further information or log file.

Can you please do the following and send the archive to me via direct message?

1 Like

Ah... sorry I probably misunderstood your problem! Please follow my comment here Maintenance Mode enabled after upgrade

Hey @Yuri
I already tried the solution you posted there. It did not help.
I'm sending you the archive now as a PM.

I was able to resolve the problem thanks to a lot of help of @Yuri.
Thank you very much.

The issue was that my automation tried to perform the version upgrade multiple times and during a rollback of such an update some configuration got deleted.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.