Org.elasticsearch.transport.ConnectTransportException

In the hadoop job I used embedded client. For some reason the job failed.
But even though job failed and the "embedded client" died, log is still
spooling this message. I think it's trying to reconnect to the client. So
my question is, this the right thing for a cluster to do? How do others
deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros] failed
to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

Any help would be appreciated. This behaviour makes cluster unusable.

On Thu, May 17, 2012 at 12:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job failed.
But even though job failed and the "embedded client" died, log is still
spooling this message. I think it's trying to reconnect to the client. So
my question is, this the right thing for a cluster to do? How do others
deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros] failed
to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

Are you closing the Client when you are done. And if you started a Node
client, are you closing the node as well?

On Thu, May 17, 2012 at 9:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job failed.
But even though job failed and the "embedded client" died, log is still
spooling this message. I think it's trying to reconnect to the client. So
my question is, this the right thing for a cluster to do? How do others
deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros] failed
to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

On Sun, May 20, 2012 at 1:28 PM, Shay Banon kimchy@gmail.com wrote:

Are you closing the Client when you are done. And if you started a Node
client, are you closing the node as well?

I'll look inside "wonderdog" code. But my question was why should this
impact other nodes? Failures could occur anytime and this shouldn't impact
the clusters.

On Thu, May 17, 2012 at 9:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job failed.
But even though job failed and the "embedded client" died, log is still
spooling this message. I think it's trying to reconnect to the client. So
my question is, this the right thing for a cluster to do? How do others
deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros] failed
to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

You did not mention in the mail that it affects other nodes, can you
elaborate on it?

On Mon, May 21, 2012 at 1:18 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Sun, May 20, 2012 at 1:28 PM, Shay Banon kimchy@gmail.com wrote:

Are you closing the Client when you are done. And if you started a Node
client, are you closing the node as well?

I'll look inside "wonderdog" code. But my question was why should this
impact other nodes? Failures could occur anytime and this shouldn't impact
the clusters.

On Thu, May 17, 2012 at 9:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job
failed. But even though job failed and the "embedded client" died, log is
still spooling this message. I think it's trying to reconnect to the
client. So my question is, this the right thing for a cluster to do? How do
others deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros]
failed to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

On Wed, May 23, 2012 at 2:05 PM, Shay Banon kimchy@gmail.com wrote:

You did not mention in the mail that it affects other nodes, can you
elaborate on it?

What I see is that the nodes that are up keeps throwing tons of error
messages. They never take that embeded client node out of cluster. When I
tried to send more requests to the already active node it just hangs.

On Mon, May 21, 2012 at 1:18 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Sun, May 20, 2012 at 1:28 PM, Shay Banon kimchy@gmail.com wrote:

Are you closing the Client when you are done. And if you started a Node
client, are you closing the node as well?

I'll look inside "wonderdog" code. But my question was why should this
impact other nodes? Failures could occur anytime and this shouldn't impact
the clusters.

On Thu, May 17, 2012 at 9:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job
failed. But even though job failed and the "embedded client" died, log is
still spooling this message. I think it's trying to reconnect to the
client. So my question is, this the right thing for a cluster to do? How do
others deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros]
failed to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException: [Infamnia][inet[/
140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

If you don't close the client (or kill -9 the process for example), then
that node will be removed eventually, depending on the fault detection
process and ping intervals. Do you still see those message constantly, or
do they go away. Usually, btw, the socket ends up being closed / broken, so
the detection of failure is much faster. Not sure what happens in Hadoop in
case of a job failure.

On Thu, May 24, 2012 at 1:13 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Wed, May 23, 2012 at 2:05 PM, Shay Banon kimchy@gmail.com wrote:

You did not mention in the mail that it affects other nodes, can you
elaborate on it?

What I see is that the nodes that are up keeps throwing tons of error
messages. They never take that embeded client node out of cluster. When I
tried to send more requests to the already active node it just hangs.

On Mon, May 21, 2012 at 1:18 AM, Mohit Anchlia mohitanchlia@gmail.comwrote:

On Sun, May 20, 2012 at 1:28 PM, Shay Banon kimchy@gmail.com wrote:

Are you closing the Client when you are done. And if you started a Node
client, are you closing the node as well?

I'll look inside "wonderdog" code. But my question was why should this
impact other nodes? Failures could occur anytime and this shouldn't impact
the clusters.

On Thu, May 17, 2012 at 9:41 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

In the hadoop job I used embedded client. For some reason the job
failed. But even though job failed and the "embedded client" died, log is
still spooling this message. I think it's trying to reconnect to the
client. So my question is, this the right thing for a cluster to do? How do
others deal with this situation and what should I do?

[2012-05-17 12:39:09,901][WARN ][cluster.service ] [Eros]
failed to reconnect to node
[Infamnia][YMmBmOEFTkKbTWKSwICnbA][inet[/140.18.62.199:9300]]{client=true,
data=false}
org.elasticsearch.transport.ConnectTransportException:
[Infamnia][inet[/140.18.62.199:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:560)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:503)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:482)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:128)
at
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:377)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:399)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:361)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:277)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)