Error in jvmgcmonitorservice (gc overhead)

I am having the following problem.

$ cat /var/log/elasticsearch/elasticsearch.log | grep WARN
[2021-07-09T00:04:51,795][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][young][214003][2290] duration [1.7s], collections [1]/[2.5s], total [1.7s]/[21.7m], memory [5.7gb]->[3.4gb]/[30gb], all_pools {[young] [512mb]->[16mb]/[0b]}{[old] [4.4gb]->[3.4gb]/[30gb]}{[survivor] [848mb]->[64mb]/[0b]}
[2021-07-09T00:04:51,899][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][214003] overhead, spent [1.7s] collecting in the last [2.5s]
[2021-07-09T01:14:38,495][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][young][218183][2319] duration [5.4s], collections [1]/[6s], total [5.4s]/[21.8m], memory [5.7gb]->[4gb]/[30gb], all_pools {[young] [176mb]->[32mb]/[0b]}{[old] [4.3gb]->[3.9gb]/[30gb]}{[survivor] [1.2gb]->[64mb]/[0b]}
[2021-07-09T01:14:38,505][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][218183] overhead, spent [5.4s] collecting in the last [6s]
[2021-07-09T01:30:04,271][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][219107] overhead, spent [718ms] collecting in the last [1.1s]
[2021-07-09T02:17:41,291][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][young][221962][2344] duration [1.5s], collections [1]/[2.3s], total [1.5s]/[21.8m], memory [21.1gb]->[4.7gb]/[30gb], all_pools {[young] [16.7gb]->[64mb]/[0b]}{[old] [3.2gb]->[3.2gb]/[30gb]}{[survivor] [1.1gb]->[1.3gb]/[0b]}
[2021-07-09T02:17:41,313][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][221962] overhead, spent [1.5s] collecting in the last [2.3s]
[2021-07-09T11:39:17,849][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][young][255645][2418] duration [2.5s], collections [1]/[2.5s], total [2.5s]/[22m], memory [5.3gb]->[3.1gb]/[30gb], all_pools {[young] [736mb]->[32mb]/[0b]}{[old] [3.9gb]->[2.9gb]/[30gb]}{[survivor] [768mb]->[72mb]/[0b]}
[2021-07-09T11:39:17,851][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][255645] overhead, spent [2.5s] collecting in the last [2.5s]
[2021-07-09T12:31:40,883][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][young][258786][2434] duration [1.7s], collections [1]/[2.2s], total [1.7s]/[22m], memory [21.2gb]->[4.5gb]/[30gb], all_pools {[young] [16.7gb]->[80mb]/[0b]}{[old] [3.3gb]->[3.3gb]/[30gb]}{[survivor] [1.1gb]->[1gb]/[0b]}
[2021-07-09T12:31:41,374][WARN ][o.e.m.j.JvmGcMonitorService] [ITS-ELST-01] [gc][258786] overhead, spent [1.7s] collecting in the last [2.2s]

I think that is a warning that the heap memory is exhausted, but I believe that the heap memory is well provided for.
The basis for this is the following values.

  • Memory capacity
$ free
              total        used        free      shared  buff/cache   available
Mem:       32779936    31255628      222328         204     1301980     1121152
Swap:      33554424     6078000    27476424
  • Allocating Heap Size to ElasticSearch
$ cat /etc/elasticsearch/jvm.options
################################################################
## IMPORTANT: JVM heap size
################################################################
-Xms30g
-Xmx30g
  • Heap size in use
$ curl -X GET "localhost:9200/_cat/nodes?v=true&h=heap.current&pretty"
heap.current
      13.2gb
  • Number of shards
$ curl -X GET http://localhost:9200/_count?pretty
{
  "count" : 8005914274,
  "_shards" : {
    "total" : 712,
    "successful" : 712,
    "skipped" : 0,
    "failed" : 0
  }
}

What steps does this suggest need to be taken?

I suspect the issue is that you have swap enabled which can slow down GC considerably and thereby cause problems and instability. The recommendation is that you disable swap and set heap size to 50% of available RAM assuming there are no other processes running on the host.

Which version of Elasticsearch are you using?

The ElasticSearch version is 7.12.

The number of shards is currently about 700, but it is expected to increase to about 1500 soon.
In that case, is it still OK to set the heap size to 16GB?

Heap size allocation is now set to 50% of RAM.

$ cat /etc/elasticsearch/jvm.options
################################################################
## IMPORTANT: JVM heap size
################################################################
-Xms16g
-Xmx16g

$ systemctl restart elasticsearch

However, before checking the gc logs, it failed to collect the logs.
According to Logstash, it seems that ElasticSearch cannot be reached, but ElasticSearch is running.

Will Logstash reconnect?

$ tail -n 30 /var/log/logstash/logstash-plain.log
[2021-07-09T16:39:06,740][ERROR][logstash.outputs.elasticsearch][main][5a453aee58166ab36916f072e60343fa3613aadce50fa0c5edb6d39850da3625] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>64}
... snip ...
[2021-07-09T16:39:23,221][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://localhost:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://localhost:9200/][Manticore::SocketException] Connection refused"}
$ systemctl status elasticsearch
elasticsearch.service - Elastic Search
   Loaded: Loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri, 2021-07-09 16:39:25 JST; 14min ago
     Docs: https://www.elastic.co
 Main PID: 27807 (java)
   CG group. /system.slice/elasticsearch.service
           tq27807 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.n..
           mq28007 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

 Jul 09 16:38:20 ITS-ELST-01 systemd[1]: Starting Elasticsearch...
 Jul 09 16:39:25 ITS-ELST-01 systemd[1]: Started Elasticsearch.

Which version of Elasticsearch are you using? Did you disable swap?

The ElasticSearch version is 7.12.

Failed to disable the swap.

$ sudo swapoff -a
swapoff: /swapfile: swapoff failed: Unable to allocate memory

How much RAM does the host have?

32GB available.

free
              total        used        free      shared  buff/cache   available
Mem:       32779936    16430940    14724380           8     1624616    15946296
Swap:      16777212     6632888    10144324

This is not a problem at the moment.

When ElasticSearch is restarted, it seems to take some time to reconnect with Logstash, maybe 30 minutes.


I have configured the following settings, but the warning logs continue to appear.

$ vi /etc/elasticsearch/elasticsearch.yml
bootstrap.memory_lock: true
$ vi /etc/security/limits.conf
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

Perhaps the settings are not enabled.
Do you know what I should do?

$ curl -X GET "localhost:9200/_nodes?filter_path=**.mlockall&pretty"
{
  "nodes" : {
    "L5dAcPbfSpK9HjITjkcmDw" : {
      "process" : {
        "mlockall" : false
      }
    }
  }
}

I would like to maintain an index of over 1000 shards to hold multiple types of logs for 60 days.
I have expanded the memory size to 32GB for this purpose, but can ElasticSearch only use up to 50% of the memory?
If so, do I actually need to allocate 64GB memory and set the maximum heap size to 32GB?
(This is indeed unrealistic, but...)

I'll let you know how it goes.

As usual, the value of mlockall is false, and I cannot disable the swap area, but I have confirmed that I have not had any errors in the week so far since I last reported.

The three files that I edited are as follows, and the changes are as described in my previous post.

  • /etc/elasticsearch/jvm.options
  • /etc/elasticsearch/elasticsearch.yml
  • /etc/security/limits.conf

The swap space is not disabled, though, which is not something Elastic recommends and needs to be resolved.
First of all, I would like to close this case as the event has been resolved.

@Christian_Dahlqvist
Thank you for sticking with me to resolve the error.

I have same error..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.