Allocators lost connection to ZooKeeper

Hi Guys

After increasing the RAM of all allocator nodes (this caused a sequential restart) they are no longer able to connect to ZooKeeper.
Here are a couple of logs I found:
services-forwarder.log

[2017-11-21 09:02:20,780][INFO ][org.apache.zookeeper.ClientCnxn] Opening socket connection to server 0.0.0.0/0.0.0.0:2181. Will not attempt to authenticate using SASL (unknown error) {}
[2017-11-21 09:02:20,780][WARN ][org.apache.zookeeper.ClientCnxn] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect {}
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)

allocator.log: https://gist.github.com/Lafunamor/ed8db7e384a442ecc2cb233c2c1405bb

I don't know if it's important but this cluster was upgraded from 1.0.2 to 1.1.0.
Do you guys have any idea how I can recover from this state?

Hi,
Can you please check if container frc-client-forwarders-client-forwarder is running and if there are any errors? Also, I would be great if you can send logs from that host.

Hi @Yuri
Thank you for your answer.
The client forwarder is up and running on the allocator and here is the client-forwarder log from the allocator node:
https://gist.github.com/Lafunamor/0ab7d0835c53b2d1a2ee93c11cc80891

Additionally here is the log from the client forwarder on a director node (container is also up and running):
https://gist.github.com/Lafunamor/7e184a6528db3beb10bf658887cf6f78
Let me know if you need any further information or log file.

Can you please do the following https://www.elastic.co/guide/en/cloud-enterprise/current/ece-getting-help.html and send the archive to me via direct message?

1 Like

Ah... sorry I probably misunderstood your problem! Please follow my comment here Maintenance Mode enabled after upgrade

Hey @Yuri
I already tried the solution you posted there. It did not help.
I'm sending you the archive now as a PM.

I was able to resolve the problem thanks to a lot of help of @Yuri.
Thank you very much.

The issue was that my automation tried to perform the version upgrade multiple times and during a rollback of such an update some configuration got deleted.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.