org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out after [5002ms] without any hints and exceptions in cluster


(deer) #1

Hi,

My elasticsearch cluster recently has a high response time.

Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)

After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).

Master node and other nodes are fine. The cluster doing fine after removing
the problem node.

I've enabled the DEBUG level in the logging.yml and checked all the log in
master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?

The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem node.
It is working.

in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.

Reboot the machine and re-install the elasticsearch do not help.

I try to use the problem node to host a cluster by itself and use the same
java client to connect it. It is working fine.

There is not exception and stack trace in both server side and client side
when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]

Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a211bd9-9d8f-46a1-a311-384912ed8da8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Nodes timing out can be indicative of heavy GC. Do the logs show anything
in that regards?

Can you share more info on how big your nodes are, what your dataset size
is, what java version you are on?

On 9 December 2014 at 15:48, Hui dannyhui1103@gmail.com wrote:

Hi,

My elasticsearch cluster recently has a high response time.

Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)

After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).

Master node and other nodes are fine. The cluster doing fine after
removing the problem node.

I've enabled the DEBUG level in the logging.yml and checked all the log in
master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?

The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem
node. It is working.

in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.

Reboot the machine and re-install the elasticsearch do not help.

I try to use the problem node to host a cluster by itself and use the same
java client to connect it. It is working fine.

There is not exception and stack trace in both server side and client side
when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]

Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1a211bd9-9d8f-46a1-a311-384912ed8da8%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1a211bd9-9d8f-46a1-a311-384912ed8da8%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_eVs54ayu8GJ2sFF%3Dx%3DaDcwM8HyiLcFawrME4X5CAXJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #3

Hi,

Thanks for the reply.

Checked they are not doing any GC in all nodes.

For cluster :

  • version 1.1.1
  • 8 nodes
  • there are 2 indexs.
    index 1
    size: 55.1G (164G)
    docs: 4,272,425 (7,155,663)

index 2
size: 113G (341G)
docs: 7,717,476 (11,271,866)

For each nodes :
memory 14gb

Both server and client side are using jdk1.6.0_30

average request count : ~1000
average response time : ~100ms

Only single node has the timeout issue.

Detail log(Client) :
2014-12-09 08:36:30,959 INFO org.elasticsearch.client.transport -
[Alkhema] failed to get local cluster state for
[10.1.4.196:9200][vs5uD2kLTXGkWDNrgsAZig][cluster_name][inet[/10.1.4.196:9200]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[10.1.4.196:9200][inet[/10.1.4.196:9300]][cluster/state] request_id
[1074667] timed out after [5002ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

No exceptions, gc and log in server side when we have the timeout issue.

Many thanks.

On Tuesday, December 9, 2014 10:57:28 PM UTC+8, Mark Walkom wrote:

Nodes timing out can be indicative of heavy GC. Do the logs show anything
in that regards?

Can you share more info on how big your nodes are, what your dataset size
is, what java version you are on?

On 9 December 2014 at 15:48, Hui <dannyh...@gmail.com <javascript:>>
wrote:

Hi,

My elasticsearch cluster recently has a high response time.

Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)

After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).

Master node and other nodes are fine. The cluster doing fine after
removing the problem node.

I've enabled the DEBUG level in the logging.yml and checked all the log
in master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?

The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem
node. It is working.

in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.

Reboot the machine and re-install the elasticsearch do not help.

I try to use the problem node to host a cluster by itself and use the
same java client to connect it. It is working fine.

There is not exception and stack trace in both server side and client
side when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]

Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1a211bd9-9d8f-46a1-a311-384912ed8da8%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1a211bd9-9d8f-46a1-a311-384912ed8da8%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f43d657-b482-4cbd-8b5c-3f580356cd69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4