GC - Node Freezes and all operations fails from client

Pradeep_Gowda · May 6, 2016, 8:27am

Hey All,

Currently, I am running ES 1.7.1 with 8GB heap on a 16 CPU core machine. I am doing filter/Aggregation operation on demand. Node was good for few days but suddenly I started seeing GC logs

[monitor.jvm ] [vManage Node] [gc][young][1342325][31487] duration [1s], collections [1]/[1.3s], total [1s]/[8.6m], memory [7.2gb]->[7.1gb]/[9.5gb], all_pools {[young] [480.8mb]->[8.5mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [6.6gb]->[7gb]/[8.6gb]}

and all operations from java client started failing

 org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes were available
.
.
.
Caused by: org.elasticsearch.transport.NodeDisconnectedException: [Node][inet[/127.0.0.1:9300]][indices:data/write/index] disconnected

and

[transport] (elasticsearch[Blob][generic][T#22]) [Blob] failed to get local cluster state for [Node][][localhost][inet[/127.0.0.1:9300]], disconnecting...: org.elasticsearch.transport.ReceiveTimeoutTransportException: [Node][inet[/127.0.0.1:9300]][cluster:monitor/state] request_id [308108] timed out after [15001ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529) [elasticsearch-1.7.1.jar:]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_25]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_25]
	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_25]

Why does server become not responsive when it goe's on GC? It is supposed to be run in the background without disconnecting the client.

Can anyone help me with this issue?

warkolm · May 6, 2016, 8:28am

GC is a stop the world process, but it shouldn't really be causing major issues when it is that short.
Is the node overloaded?

Pradeep_Gowda · May 6, 2016, 8:40am

This happened when I was trying to do bulk insert operation and each bulk was less than 5MB or 5000 records.

[2016-05-05 15:10:08,126][INFO ][cluster.metadata         ] [Node] [index1] update_mapping [index1-2016-05-05T15] (dynamic)
[2016-05-05 15:23:54,781][WARN ][monitor.jvm              ] [Node] [gc][young][1342325][31487] duration [1s], collections [1]/[1.3s], total [1s]/[8.6m], memory [7.2gb]->[7.1gb]/[9.5gb], all_pools {[young] [480.8mb]->[8.5mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [6.6gb]->[7gb]/[8.6gb]}
[2016-05-05 15:24:02,141][WARN ][monitor.jvm              ] [Node] [gc][young][1342331][31492] duration [1.4s], collections [1]/[1.9s], total [1.4s]/[8.6m], memory [6.7gb]->[6.3gb]/[9.5gb], all_pools {[young]     [552.9mb]->[2.8mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [6gb]->[6.2gb]/[8.6gb]}
.
.   
GC
GC
GC      
.
.
[2016-05-05 15:44:26,114][WARN ][monitor.jvm              ] [Node] [gc][old][1342993][38] duration [39.3s], collections [1]/[40.1s], total [39.3s]/[7.9m], memory [9.5gb]->[8.9gb]/[9.5gb], all_pools {[young] [865.3mb]->[350.4mb]/[865.3mb]}{[survivor] [64.6mb]->[0b]/[108.1mb]}{[old] [8.6gb]->[8.6gb]/[8.6gb]}
[2016-05-05 15:45:05,751][WARN ][monitor.jvm              ] [Node] [gc][old][1342996][39] duration [36.8s], collections [1]/[37s], total [36.8s]/[8.5m], memory [9.5gb]->[7.8gb]/[9.5gb], all_pools {[young] [865.3mb]->[5.3mb]/[865.3mb]}{[survivor] [95.9mb]->[0b]/[108.1mb]}{[old] [8.6gb]->[7.8gb]/[8.6gb]}
[2016-05-05 16:00:01,030][INFO ][cluster.metadata         ] [Node] [index1] update_mapping [index1-2016-05-05T16] (dynamic)

Approximately it took 1 hour to finish the GC. In this time, client was totally useless. How can we handle this kind of scenario gracefully?

Bruce_Ritchie · May 6, 2016, 1:16pm

Your node is running out of memory and the JVM is having to do full gc's to try and free up space. Restart it. The reasons it could be running out of memory could be anything from just too much data for the available memory, a memory leak in ES, or many other potential reasons. There is a number of threads in this group that have solutions to issues like yours.

The general solution is to cluster with 3 nodes such that a loss of any single node won't kill the cluster.

Topic		Replies	Views
Elasticsearch GC timeout on data node Elasticsearch	2	393	August 10, 2021
Elasticsearch High CPU Usage - GC Not Working Elasticsearch	26	7054	July 5, 2017
None of the configured nodes are available when GC took more than 30 sec Elasticsearch	13	857	March 13, 2020
TransportClient Throws 'java.lang.OutOfMemoryError: GC overhead limit exceeded' when all nodes in cluster are down (1.1.1) Elasticsearch	5	1062	July 6, 2017
GC Problem Elasticsearch	3	346	July 6, 2017

GC - Node Freezes and all operations fails from client

Related topics