Receive Timeout Transport Exception Error on Elastic nodes

[2020-04-06T09:21:03,615][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [prod0] failed to execute on node [_bHsnuTsTE-omVcgDE_2fQ]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod1][10.109.42.245:9300][cluster:monitor/nodes/stats[n]] request_id [8280550] timed out afte
r [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:940) [elasticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.1.jar:6.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
[2020-04-06T09:21:38,864][WARN ][o.e.t.TransportService   ] [y76762] Received response for a request that has timed out, sent [50249ms] ago, timed out [35249m
s] ago, action [cluster:monitor/nodes/stats[n]], node [{y76764}{_bHsnuTsTE-omVcgDE_2fQ}{b84_fvfVSQyUHfaWNtH1aA}{10.109.42.245}{10.109.42.245:9300}], id [82805
50]
[2020-04-06T10:03:05,445][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [prod0] failed to execute on node [_bHsnuTsTE-omVcgDE_2fQ]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod1][10.109.42.245:9300][cluster:monitor/nodes/stats[n]] request_id [9138261] timed out afte
r [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:940) [elasticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.1.jar:6.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
[2020-04-06T10:03:44,806][WARN ][o.e.t.TransportService   ] [prod0] Received response for a request that has timed out, sent [54361ms] ago, timed out [39361m
s] ago, action [cluster:monitor/nodes/stats[n]], node [{prod1}{_bHsnuTsTE-omVcgDE_2fQ}{b84_fvfVSQyUHfaWNtH1aA}{10.109.42.245}{10.109.42.245:9300}], id [91382
61]

If you want a response I would recommend you provide some additional context and describe how you got this error rather than just posting a stack trace.

Thanks Christian,

I was about to add more details but got on to something else.

Please see the information.

Recently we started to see the above errors on this cluster. We see the shard size has reached limits but is that the cause of these errors ?

elastic+ 1530 1 11 Apr04 ? 06:22:54 /bin/java -Xms31g -Xmx31g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Xms31g -Xmx31g -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quie

curl 0:9200/_cat/nodes?v

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.109.42.243 29 98 4 1.11 0.83 0.79 mdi * y76762
10.109.42.244 39 81 3 0.56 0.57 0.65 mdi - y76763
10.109.42.245 74 87 4 0.42 0.49 0.79 mdi - y76764

curl 0:9200/_cat/shards?v

index shard prirep state docs store ip node
service 3 r STARTED 65707318 47.6gb 10.109.42.245 y76764
service 3 p STARTED 65707305 47.7gb 10.109.42.243 y76762
service 4 r STARTED 65725879 47.4gb 10.109.42.244 y76763
service 4 p STARTED 65725868 47.5gb 10.109.42.243 y76762
service 1 r STARTED 71039936 53.2gb 10.109.42.245 y76764
service 1 p STARTED 71039921 53.2gb 10.109.42.243 y76762
service 2 p STARTED 71136809 53.1gb 10.109.42.245 y76764
service 2 r STARTED 71136809 53.1gb 10.109.42.244 y76763
service 0 p STARTED 71084352 53.1gb 10.109.42.245 y76764
service 0 r STARTED 71084348 53.1gb 10.109.42.244 y76763
.kibana 0 r STARTED 3 16.8kb 10.109.42.244 y76763
.kibana 0 p STARTED 3 16.8kb 10.109.42.243 y76762

Please let me know if you need any other information.

Looking to identify what is causing this issue and how to mitigate it ..

Hello Team,

Anyone had similar issues ? Do you need ny other details ?

Is there anything in the Elasticsearch logs indicating frequent or long GC? What type of hardware is this cluster deployed on?

Hello Christian,

No, there are no traces of long GC. These nodes are deployed on VMs with 64G RAM (32G heap) & 8 vCPUs.

We see only the timed out and received response entries in the logs and the cluster state going to yellow and back quite frequently.

Regards,
Vibin

Any suggestions ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.