Occasionally "failed to get node info" and "no node available" error


(xiaofeng) #1

Our ES system works fine most of the time, but occasionally see
exceptions as follows:

2012-02-29 12:33:53 INFO [org.elasticsearch.client.transport]:109 Line

  • [Ezekiel] failed to get node info for [Ani-Mator][q4e_N2YaRsiJy4a-
    U7xTEg][inet[/10.35.0.210:9300]], disconnecting...
    org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ani-
    Mator][inet[/10.35.0.210:9300]][cluster/nodes/info] request_id [60]
    timed out after [5004ms]
    at org.elasticsearch.transport.TransportService
    $TimeoutHandler.run(TransportService.java:347)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

and then

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:
170)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:
97)
at
org.elasticsearch.client.support.AbstractClient.count(AbstractClient.java:
236)
at
org.elasticsearch.client.transport.TransportClient.count(TransportClient.java:
337)
at
org.elasticsearch.action.count.CountRequestBuilder.doExecute(CountRequestBuilder.java:
124)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:
47)

The cluster was in the same machine, one node with 3 shards. And there
should be no network problem.

any idea what caused such an error?


(xiaofeng) #2

The strange thing is, we used several threads to ran queries, when one
thread encountered the "failed to get node info", other threads were
working quite well.

On 2月29日, 下午4时07分, xiaofengy xfyang...@gmail.com wrote:

Our ES system works fine most of the time, but occasionally see
exceptions as follows:

2012-02-29 12:33:53 INFO [org.elasticsearch.client.transport]:109 Line

  • [Ezekiel] failed to get node info for [Ani-Mator][q4e_N2YaRsiJy4a-
    U7xTEg][inet[/10.35.0.210:9300]], disconnecting...
    org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ani-
    Mator][inet[/10.35.0.210:9300]][cluster/nodes/info] request_id [60]
    timed out after [5004ms]
    at org.elasticsearch.transport.TransportService
    $TimeoutHandler.run(TransportService.java:347)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

and then

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(Tran sportClientNodesService.java:
170)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute( InternalTransportClient.java:
97)
at
org.elasticsearch.client.support.AbstractClient.count(AbstractClient.java:
236)
at
org.elasticsearch.client.transport.TransportClient.count(TransportClient.ja va:
337)
at
org.elasticsearch.action.count.CountRequestBuilder.doExecute(CountRequestBu ilder.java:
124)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
47)

The cluster was in the same machine, one node with 3 shards. And there
should be no network problem.

any idea what caused such an error?


(Shay Banon) #3

How many nodes do you connect to in the cluster. The failure means that the transport client periodic check and buildup of the node list has failed to get the node info for that specific node.

On Wednesday, February 29, 2012 at 10:27 AM, xiaofengy wrote:

The strange thing is, we used several threads to ran queries, when one
thread encountered the "failed to get node info", other threads were
working quite well.

On 2月29日, 下午4时07分, xiaofengy <xfyang...@gmail.com (http://gmail.com)> wrote:

Our ES system works fine most of the time, but occasionally see
exceptions as follows:

2012-02-29 12:33:53 INFO [org.elasticsearch.client.transport]:109 Line

  • [Ezekiel] failed to get node info for [Ani-Mator][q4e_N2YaRsiJy4a-
    U7xTEg][inet[/10.35.0.210:9300]], disconnecting...
    org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ani-
    Mator][inet[/10.35.0.210:9300]][cluster/nodes/info] request_id [60]
    timed out after [5004ms]
    at org.elasticsearch.transport.TransportService
    $TimeoutHandler.run(TransportService.java:347)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

and then

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(Tran sportClientNodesService.java:
170)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute( InternalTransportClient.java:
97)
at
org.elasticsearch.client.support.AbstractClient.count(AbstractClient.java:
236)
at
org.elasticsearch.client.transport.TransportClient.count(TransportClient.ja va:
337)
at
org.elasticsearch.action.count.CountRequestBuilder.doExecute(CountRequestBu ilder.java:
124)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
47)

The cluster was in the same machine, one node with 3 shards. And there
should be no network problem.

any idea what caused such an error?


(xunyong) #4

The client uses the connection pool(pool size to 10).Simulated 20 threads at the same time inserted, each thread to ensure that the connection connect and disconnect.


(xunyong) #5

I think this error may be mongoDB caused. mongoDB and ES is currently installed on the same machine. After turn off mongodb, not the discovery of the error.


(xiaofeng) #6

We have only one node in the cluster.

We ran mongoDB at the same machine. And after stopping the mongo
service, and we haven't seen the failure so far.

So we suspect the failure may be due to insufficient resource
(cpu,io,mem) on the node?

On 2月29日, 下午10时26分, Shay Banon kim...@gmail.com wrote:

How many nodes do you connect to in the cluster. The failure means that the transport client periodic check and buildup of the node list has failed to get the node info for that specific node.

On Wednesday, February 29, 2012 at 10:27 AM, xiaofengy wrote:

The strange thing is, we used several threads to ran queries, when one
thread encountered the "failed to get node info", other threads were
working quite well.

On 2月29日, 下午4时07分, xiaofengy <xfyang...@gmail.com (http://gmail.com)> wrote:

Our ES system works fine most of the time, but occasionally see
exceptions as follows:

2012-02-29 12:33:53 INFO [org.elasticsearch.client.transport]:109 Line

  • [Ezekiel] failed to get node info for [Ani-Mator][q4e_N2YaRsiJy4a-
    U7xTEg][inet[/10.35.0.210:9300]], disconnecting...
    org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ani-
    Mator][inet[/10.35.0.210:9300]][cluster/nodes/info] request_id [60]
    timed out after [5004ms]
    at org.elasticsearch.transport.TransportService
    $TimeoutHandler.run(TransportService.java:347)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

and then

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(Tran sportClientNodesService.java:
170)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute( InternalTransportClient.java:
97)
at
org.elasticsearch.client.support.AbstractClient.count(AbstractClient.java:
236)
at
org.elasticsearch.client.transport.TransportClient.count(TransportClient.ja va:
337)
at
org.elasticsearch.action.count.CountRequestBuilder.doExecute(CountRequestBu ilder.java:
124)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
47)

The cluster was in the same machine, one node with 3 shards. And there
should be no network problem.

any idea what caused such an error?


(Shay Banon) #7

Sounds like it...

On Thursday, March 1, 2012 at 8:26 AM, xiaofengy wrote:

We have only one node in the cluster.

We ran mongoDB at the same machine. And after stopping the mongo
service, and we haven't seen the failure so far.

So we suspect the failure may be due to insufficient resource
(cpu,io,mem) on the node?

On 2月29日, 下午10时26分, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

How many nodes do you connect to in the cluster. The failure means that the transport client periodic check and buildup of the node list has failed to get the node info for that specific node.

On Wednesday, February 29, 2012 at 10:27 AM, xiaofengy wrote:

The strange thing is, we used several threads to ran queries, when one
thread encountered the "failed to get node info", other threads were
working quite well.

On 2月29日, 下午4时07分, xiaofengy <xfyang...@gmail.com (http://gmail.com)> wrote:

Our ES system works fine most of the time, but occasionally see
exceptions as follows:

2012-02-29 12:33:53 INFO [org.elasticsearch.client.transport]:109 Line

  • [Ezekiel] failed to get node info for [Ani-Mator][q4e_N2YaRsiJy4a-
    U7xTEg][inet[/10.35.0.210:9300]], disconnecting...
    org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ani-
    Mator][inet[/10.35.0.210:9300]][cluster/nodes/info] request_id [60]
    timed out after [5004ms]
    at org.elasticsearch.transport.TransportService
    $TimeoutHandler.run(TransportService.java:347)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor
    $Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

and then

org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService.execute(Tran sportClientNodesService.java:
170)
at
org.elasticsearch.client.transport.support.InternalTransportClient.execute( InternalTransportClient.java:
97)
at
org.elasticsearch.client.support.AbstractClient.count(AbstractClient.java:
236)
at
org.elasticsearch.client.transport.TransportClient.count(TransportClient.ja va:
337)
at
org.elasticsearch.action.count.CountRequestBuilder.doExecute(CountRequestBu ilder.java:
124)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuil der.java:
47)

The cluster was in the same machine, one node with 3 shards. And there
should be no network problem.

any idea what caused such an error?


(system) #8