GC leader to timeout pings

LinPower · August 6, 2018, 11:55am

[2018-08-06T16:26:24,830][INFO ][o.e.m.j.JvmGcMonitorService] [...] [gc][866954] overhead, spent [292ms] collecting in the last [1s]
[2018-08-06T16:26:27,831][INFO ][o.e.m.j.JvmGcMonitorService] [...] [gc][866957] overhead, spent [305ms] collecting in the last [1s]
[2018-08-06T16:26:30,955][INFO ][o.e.m.j.JvmGcMonitorService] [...] [gc][866960] overhead, spent [469ms] collecting in the last [1.1s]
[2018-08-06T16:26:34,956][INFO ][o.e.m.j.JvmGcMonitorService] [...] [gc][866964] overhead, spent [251ms] collecting in the last [1s]
[2018-08-06T16:27:59,397][INFO ][o.e.d.z.ZenDiscovery     ] [...] master_left [{...-master}{HVzJwsDrQieIEMIe-c1KSg}{aH617e42Q6u5OPWBOv1B_Q}{...}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2018-08-06T16:27:59,397][WARN ][o.e.d.z.ZenDiscovery     ] [...] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: nodes:
  {...}{qMUnADcLRA2nCpdqxF61ZQ}{8RG1XNiCTDOGNnVJ_VT23w}{...}
  {...}{S79zJu2uRciv4NuTyDKObQ}{rJHFMpUpQdeoj90-Z-Ewcw}{...}
  {...}{4EwNJclTTBSYJl1bSLCaCA}{FZ38bVYnRoOXMaTP08nBaA}{...}, local
  {...-master}{HVzJwsDrQieIEMIe-c1KSg}{aH617e42Q6u5OPWBOv1B_Q}{...}, master
  {...}{fhsKrrlISL-k_pzLn6fKrg}{FUuWfW17TEqEXAZftHPLTQ}{...}
  {...-master}{ex16_BD9T9GOaFz3hVv4cw}{NyOFn5vzSvCZ6xXYllHbqA}{...}
  {...}{Oy__dkFsStOqvpj5odSfpQ}{bDS_fVH_R3ezoDGmLaIdWA}{...}
  {...}{aT8vSnbmSo6z8Ezuy9zvaQ}{xzyFq-cAROe2rXcGjB4qXg}{...}
  {...}{esIsbvmlRg-aZxbMdFoZvQ}{ZFr_G0rsS_iHqpkmX9vAAg}{...}
  {...}{ABsIgUwvS7WT2-3Oo69BvA}{80DQYjqmSZON-ZvNT3fdUw}{...}
  {...-master}{JGI5LsWiQtKuULxPjGw0zQ}{bZrMT64WSbCNBqTOLPMPGA}{...}
  {...}{EHHOEIe4SYyn5YTyvlwUxQ}{TpQeXLgAT9q5cMPP5Lx8gw}{...}`

There is some log of the data node, which ping master timeout and consider the master is left. But actually the master is online. Also the master ping the data node timeout and consider the data node is left.(See the master's logs in the picture below)

[2018-08-06T16:27:58,352][INFO ][o.e.c.r.a.AllocationService] [...-master] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{...}{4EwNJclTTBSYJl1bSLCaCA}{FZ38bVYnRoOXMaTP08nBaA}{...} failed to ping, tried [3] times, each with maximum [30s] timeout]).
[2018-08-06T16:27:58,352][INFO ][o.e.c.s.MasterService    ] [...-master] zen-disco-node-failed({...}{4EwNJclTTBSYJl1bSLCaCA}{FZ38bVYnRoOXMaTP08nBaA}{...}), reason(failed to ping, tried [3] times, each with maximum [30s] timeout)[{...}{4EwNJclTTBSYJl1bSLCaCA}{FZ38bVYnRoOXMaTP08nBaA}{...} failed to ping, tried [3] times, each with maximum [30s] timeout], reason: removed {{...}{4EwNJclTTBSYJl1bSLCaCA}{FZ38bVYnRoOXMaTP08nBaA}{...},}

This situation had occurred some times. And there was always some long GC before ping timeout. I don't know why the GC can leader to timeout pings 3 times. And whether there are some configuration or some work around method to avoid pings timeout.

The JVM conf:

-Xms8g
-Xmx8g
-XX:+UseG1GC

dadoonet · August 6, 2018, 12:08pm

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

LinPower · August 7, 2018, 4:01am

OK, I edit the description. Thx.

dadoonet · August 7, 2018, 5:52am

Your nodes are under memory pressure. You need to check this out and fix it.

What is the output of:

GET _cluster/health?v
GET _cluster/nodes?v
GET _cluster/indices?v

LinPower · August 8, 2018, 4:14am

I will check the monitor for memory pressure. But why some GC pause can lead to ping timeout 3 times? Is there some other factor to influence the ping?

system · September 5, 2018, 4:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Several times of large gc, then will be disconnected, this is why? Elasticsearch	1	355	February 15, 2019
[WARN ][o.e.m.j.JvmGcMonitorService] [esm3] [gc][475373] overhead, spent [658ms] collecting in the last [1.2s] Elasticsearch	2	1294	March 22, 2018
GC on elastic search data nodes and node automatically reconnect from the cluster Elasticsearch	5	312	January 2, 2023
Garbage collection times out? Elasticsearch	3	805	March 5, 2018
[WARN ][o.e.t.TransportService ] [esm3] Received response for a request that has timed out, sent [33470ms] ago, timed out [3470ms] ago, action [internal:discovery/zen/fd/master_ping] Elasticsearch	3	2804	March 23, 2018

GC leader to timeout pings

Related topics