Main problem with garbage collector

guys, I really need help. I've researched a lot on the internet but I couldn't help myself.
I have an ELK stack with an elasticsearch 6.5 server, running for 3 years. lately i have had a lot of problem with ingesting data from logstash to elastic, it has errors.
the server has 64gb of ram and elastic heapsize is 48gb.

in my elasticseach.log file I have the following error:
[2021-07-02T11:25:29,501][INFO ][o.e.m.j.JvmGcMonitorService] [jRaE4Ys] [gc][1548814] overhead, spent [278ms] collecting in the last [1s]
[2021-07-02T11:25:32,548][INFO ][o.e.m.j.JvmGcMonitorService] [jRaE4Ys] [gc][1548817] overhead, spent [284ms] collecting in the last [1s]

it occurs every second

in my GC garbage collector log, I get this error:

: 991400K->87772K(996800K), 0.0555702 secs] 28322834K->27477579K(50220928K), 0.0558269 secs] [Times: user=0.58 sys=0.00, real=0.05 secs]
2021-07-02T11:28:04.998-0300: 1564168.407: Total time for which application threads were stopped: 0.0615021 seconds, Stopping threads took: 0.0026259 seconds
2021-07-02T11:28:05.350-0300: 1564168.759: [GC (Allocation Failure) 2021-07-02T11:28:05.350-0300: 1564168.759: [ParNew
Desired survivor size 56688640 bytes, new threshold 1 (max 6)

  • age 1: 64528552 bytes, 64528552 total
  • age 2: 45103376 bytes, 109631928 total
    : 973852K->110720K(996800K), 0.0488516 secs] 28363659K->27510347K(50220928K), 0.0491062 secs] [Times: user=0.59 sys=0.00, real=0.05 secs]
    2021-07-02T11:28:05.399-0300: 1564168.808: Total time for which application threads were stopped: 0.0529833 seconds, Stopping threads took: 0.0004834 seconds
    2021-07-02T11:28:05.702-0300: 1564169.112: [GC (Allocation Failure) 2021-07-02T11:28:05.702-0300: 1564169.112: [ParNew
    Desired survivor size 56688640 bytes, new threshold 1 (max 6)

consulting the monitoring of the kibana, the jvm heapsize of the elastic is always close to 38gb, with a good margin up to the 48gb available.

I need a light, thanks for any tips.

This sounds like a problem. The recommendation is to keep the Java heap at 50% of the available host RAM (assuming no other processes running there) and at a maximum of around 30GB. This allows Java to use compressed pointers which is more efficient. As soon as you go over this level you stop benefitting from this and the usable heap space goes actually goes down. I would recommend reading this blog post and this part of the documentation for further details.

1 Like

I had already tried to follow this recommendation in the past but I was unsuccessful, after your tip I tried again to use the 32gb heap and continued with the error below in the GC log, this happens practically every second.
any more suggestions?
thanks for your time.

  • age 1: 66388528 bytes, 66388528 total
    : 971343K->89885K(996800K), 0.0505691 secs] 17053661K->16227764K(33443712K), 0.0507470 secs] [Times: user=0.48 sys=0.01, real=0.05 secs]
    2021-07-05T08:44:57.524-0300: 392.346: Total time for which application threads were stopped: 0.0523419 seconds, Stopping threads took: 0.0007045 seconds
    2021-07-05T08:44:57.767-0300: 392.589: [GC (Allocation Failure) 2021-07-05T08:44:57.767-0300: 392.589: [ParNew
    Desired survivor size 56688640 bytes, new threshold 6 (max 6)

The limit is below 32GB, not exactly 32GB. I think the blog post I linked to describes how to identify you are bolow the cutoff point.

What Java version?

How many CPUs? Any thread pool tuning?

ES: how many nodes / indices / shards / documents?

Most the time, GC error is because a wrong architecture, or too much traffic.

What Java version?
java version "1.8.0_191"

How many CPUs?
CPU(s): 16, CPU MHz: 2394.000
Any thread pool tuning?
this I don't know what it is and how to see

ES: how many nodes / indices / shards / documents?
1 node, 282 Indices, 664 Shards, 32.6m Documents

  1. upgrade java version if possible (Elastic Support Matrix | Elasticsearch)
  2. 1 node with replica ? if yes, disable replicas
  3. too many shards for 32m document, what the index size ?
  4. I am sure a single index with 8 shards could be better

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.