Node stuck in cluster after it crashed


(Nik Everett) #1

One of my nodes crashed today we weren't able to start the machine again.
Sound like hardware problems. Any way, it is still listed in
_cluster/state and shards are trying to relocate to it. Bouncing another
node didn't remove the first node from the list.

Is there some way to force the master to check on the down machine? I'm
constantly getting this exception, which I assume is because that node is
down:
exception caught on transport layer [[id: 0x32f61132]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1L-Lh-Tb2n%3DYxOK64i2trw9L4ewHzHLHyGOveLDb04yQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #2

On Mon, Dec 16, 2013 at 9:22 PM, Nikolas Everett nik9000@gmail.com wrote:

One of my nodes crashed today we weren't able to start the machine again.
Sound like hardware problems. Any way, it is still listed in
_cluster/state and shards are trying to relocate to it. Bouncing another
node didn't remove the first node from the list.

Is there some way to force the master to check on the down machine?

I found a way: restart the master and let another master take over.
Blunt, but effective.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3UjcVeqdhs4%3D8dkT1%3DX6TBqKU8TWZberejpHB7w9x0eA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3