Clustered elasticsearch gc.log

Mohammad_Mousavi · February 2, 2020, 10:03am

Hi, I have 3 elasticsearch nodes, each one with 5 GB JVM heap size. I have 1 primary shard with 1 replica set. I see this on gc.log keeps appending. Is this a error ? what's that means ?
Thanx

[2020-02-02T10:00:34.709+0000][14109][gc,start     ] GC(68149) Pause Young (Allocation Failure)
[2020-02-02T10:00:34.709+0000][14109][gc,task      ] GC(68149) Using 4 workers of 4 for evacuation
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) Desired survivor size 17432576 bytes, new threshold 6 (max threshold 6)
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) Age table with threshold 6 (max threshold 6)
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   1:    1115400 bytes,    1115400 total
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   2:     470608 bytes,    1586008 total
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   3:       1816 bytes,    1587824 total
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   4:      10760 bytes,    1598584 total
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   5:       1528 bytes,    1600112 total
[2020-02-02T10:00:34.724+0000][14109][gc,age       ] GC(68149) - age   6:        224 bytes,    1600336 total
[2020-02-02T10:00:34.724+0000][14109][gc,heap      ] GC(68149) ParNew: 284153K->8001K(306688K)
[2020-02-02T10:00:34.724+0000][14109][gc,heap      ] GC(68149) CMS: 3022676K->3022677K(4902144K)
[2020-02-02T10:00:34.724+0000][14109][gc,metaspace ] GC(68149) Metaspace: 107250K->107250K(1148928K)
[2020-02-02T10:00:34.724+0000][14109][gc           ] GC(68149) Pause Young (Allocation Failure) 3229M->2959M(5086M) 15.640ms
[2020-02-02T10:00:34.724+0000][14109][gc,cpu       ] GC(68149) User=0.04s Sys=0.01s Real=0.02s
[2020-02-02T10:00:34.725+0000][14109][safepoint    ] Safepoint "GenCollectForAllocation", Time since last: 5982923 ns, Reaching safepoint: 250995 ns, At safepoint: 15756951 ns, Total: 16007946 ns
[2020-02-02T10:00:35.705+0000][14109][safepoint    ] Safepoint "RevokeBias", Time since last: 980388443 ns, Reaching safepoint: 367264 ns, At safepoint: 52815 ns, Total: 420079 ns
[2020-02-02T10:00:36.706+0000][14109][safepoint    ] Safepoint "Cleanup", Time since last: 1000200897 ns, Reaching safepoint: 245363 ns, At safepoint: 33374 ns, Total: 278737 ns
[2020-02-02T10:00:37.706+0000][14109][safepoint    ] Safepoint "Cleanup", Time since last: 1000238336 ns, Reaching safepoint: 188525 ns, At safepoint: 21925 ns, Total: 210450 ns
[2020-02-02T10:00:38.707+0000][14109][safepoint    ] Safepoint "Cleanup", Time since last: 1000224098 ns, Reaching safepoint: 194067 ns, At safepoint: 22823 ns, Total: 216890 ns
[2020-02-02T10:00:39.445+0000][14109][safepoint    ] Safepoint "RevokeBias", Time since last: 738362932 ns, Reaching safepoint: 181674 ns, At safepoint: 43321 ns, Total: 224995 ns
[2020-02-02T10:00:39.718+0000][14109][safepoint    ] Safepoint "RevokeBias", Time since last: 271690683 ns, Reaching safepoint: 567103 ns, At safepoint: 330003 ns, Total: 897106 ns
[2020-02-02T10:00:40.718+0000][14109][safepoint    ] Safepoint "Cleanup", Time since last: 1000154600 ns, Reaching safepoint: 221991 ns, At safepoint: 16287 ns, Total: 238278 ns

Armin_Braun · February 2, 2020, 5:53pm

Hi @Mohammad_Mousavi

the GC log is enabled in ES by default as documented here. Having things logged in the gc log does not imply an error, it's simply information about the behaviour of the GC in your system.
Looking at your logs I can see that the nodes are under some pressure but if you are not experiencing any issues with your cluster I can't see anything of concern in them.

Mohammad_Mousavi · February 3, 2020, 11:27am

Hi @Armin_Braun
Honestly 3 nodes are in 3 separate data centers.
The connections are pretty much stable and not that bad, but I see sometimes nodes become unavailable and available a few seconds later, it causes "re allocating" in indexes.
I know it's not best practice to setup a cluster over WAN, but in this case I have to.
I have other databases like Galera that works fine on this network, but I did some tuning on Galera configs and sysctl.conf to stabilize Galera over WAN, I thought that maybe there is some tuning docs for elasticsearch cluster over WAN, I would be very grateful if you could help on this.
Thanx a lot.

Armin_Braun · February 3, 2020, 2:49pm

@Mohammad_Mousavi

As far as I know we don't have any documentation that would be helpful for the cluster over WAN use case. ES assumes that the connection between nodes is sufficiently stable and low latency so that it won't be the bottle neck.

The fact that nodes sometimes become unavailable for a while and that causing relocations you can potentially fix though. See the documentation here.
You can delay allocation in case a node leaves the cluster by setting something along the lines of

"index.unassigned.node_left.delayed_timeout": "5m"

.

system · March 2, 2020, 2:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch GC allocation failure errors in GC.log Elasticsearch	1	907	August 4, 2020
Gc-young error in ES Elasticsearch	4	1038	July 5, 2017
Pause Young (Allocation Failure) in gc.log Elasticsearch	3	3514	February 2, 2021
Elasticsearch coordinating node gc.log Elasticsearch	3	638	April 24, 2020
GC logs on elasticsearch host Elasticsearch	1	380	July 5, 2017

Clustered elasticsearch gc.log

Related topics