My elasticsearch cluster recently has a high response time.
Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)
After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).
Master node and other nodes are fine. The cluster doing fine after removing
the problem node.
I've enabled the DEBUG level in the logging.yml and checked all the log in
master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?
The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem node.
It is working.
in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.
Reboot the machine and re-install the elasticsearch do not help.
I try to use the problem node to host a cluster by itself and use the same
java client to connect it. It is working fine.
There is not exception and stack trace in both server side and client side
when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?
My elasticsearch cluster recently has a high response time.
Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)
After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).
Master node and other nodes are fine. The cluster doing fine after
removing the problem node.
I've enabled the DEBUG level in the logging.yml and checked all the log in
master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?
The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem
node. It is working.
in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.
Reboot the machine and re-install the elasticsearch do not help.
I try to use the problem node to host a cluster by itself and use the same
java client to connect it. It is working fine.
There is not exception and stack trace in both server side and client side
when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?
there are 2 indexs.
index 1
size: 55.1G (164G)
docs: 4,272,425 (7,155,663)
index 2
size: 113G (341G)
docs: 7,717,476 (11,271,866)
For each nodes :
memory 14gb
Both server and client side are using jdk1.6.0_30
average request count : ~1000
average response time : ~100ms
Only single node has the timeout issue.
Detail log(Client) :
2014-12-09 08:36:30,959 INFO org.elasticsearch.client.transport -
[Alkhema] failed to get local cluster state for
[10.1.4.196:9200][vs5uD2kLTXGkWDNrgsAZig][cluster_name][inet[/10.1.4.196:9200]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:
[10.1.4.196:9200][inet[/10.1.4.196:9300]][cluster/state] request_id
[1074667] timed out after [5002ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
No exceptions, gc and log in server side when we have the timeout issue.
Many thanks.
On Tuesday, December 9, 2014 10:57:28 PM UTC+8, Mark Walkom wrote:
Nodes timing out can be indicative of heavy GC. Do the logs show anything
in that regards?
Can you share more info on how big your nodes are, what your dataset size
is, what java version you are on?
My elasticsearch cluster recently has a high response time.
Elasticsearch cluster (version 1.1.1 + number of node : 8)
Java client (client.transport.sniff is true)
After checking, I found that all java clients have the exception
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
it only happens to the connection to the same node(the problem node).
Master node and other nodes are fine. The cluster doing fine after
removing the problem node.
I've enabled the DEBUG level in the logging.yml and checked all the log
in master and nodes. there is no exception and hints for the timeout.
Any other options to trace this error?
The nodes in the cluster can see each others. There should be no problem
for the zen discovery.
I also try to use telnet in the client machine to connect the problem
node. It is working.
in the problem node,
we found the log of dmesg : possible syn flooding on port 9300. sending
cookies
Fixing the net.ipv4.tcp_max_syn_backlog does not help.
Reboot the machine and re-install the elasticsearch do not help.
I try to use the problem node to host a cluster by itself and use the
same java client to connect it. It is working fine.
There is not exception and stack trace in both server side and client
side when client side has the exception :
org.elasticsearch.transport.ReceiveTimeoutTransportException...timed out
after [5002ms]
Not sure is it related to network issue / machine issue / elasticsearch
issue. What can I do to know more about this timeout exception?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.