Elasticsarch Garbage Collection Issue

Hi Team,

I am continuously observing, my ES cluster goes into [GC] mode. Below is my cluster configuration,
3 datastore having below H/W configuration,
1: 8C/64GB/6TB
2: 8C/64GB/6TB
3: 8C/64GB/6TB

Current Cluster Stats:
Total No of Shards: 290 (including replica)
Primary Shards: 145
Heap Size Assigned: 25GB
Cluster Health: Green
Elasticsearch Version: 6.3.0

The above-mentioned cluster running in a proper manner without issue. Still cluster continuously going into Garbage Collection Mode and goes completely down.

Can anyone have any idea to run this cluster smoothly?

Elasticsearch Logs:

[2020-04-22T07:30:17,924][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][94883] overhead, spent [334ms] collecting in the last [1s]
[2020-04-22T07:32:09,993][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][94995] overhead, spent [262ms] collecting in the last [1s]
[2020-04-22T08:00:23,866][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96688] overhead, spent [383ms] collecting in the last [1s]
[2020-04-22T08:00:51,894][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96716] overhead, spent [257ms] collecting in the last [1s]
[2020-04-22T08:01:00,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96725] overhead, spent [296ms] collecting in the last [1s]
[2020-04-22T08:01:01,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96726] overhead, spent [349ms] collecting in the last [1s]
[2020-04-22T08:01:02,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96727] overhead, spent [332ms] collecting in the last [1s]
[2020-04-22T08:20:23,717][INFO ][o.e.c.m.MetaDataMappingService] [ykmWSiI] [dsdb-20200422/K5EiSEzESiuuClJobrZe3Q] update_mapping [evt]
[2020-04-22T08:30:16,736][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][98480] overhead, spent [318ms] collecting in the last [1s]
[2020-04-22T08:32:16,901][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][98600] overhead, spent [380ms] collecting in the last [1s]

How long are the GCs running for?

Hi Warkolm,

Thanks for reply.

The setup is in GC from last 15 days.

Can you elaborate more on what you mean by "goes completely down".

Yes, sure.

Whenever the Elasticsearch remains in garbage collection for a long time then the elasticsearch search stop functioning and got the below error,

elasticsearch is not running
:point_up_2: this i called as "goes completely down"

To check the elasticsearch status i used below command,

/etc/init.d/elasticsearch status

What is the output when you run that? What do the logs on the node show?
Can you call localhost:9200 and see what that returns?

When I run /etc/init.d/elasticsearch status, I am able to see the elasticsearch status, as I mentioned earlier. No logs were recorded for that.

If I execute the localhost:9200 with the help of the health parameter then I am able to see the cluster health.
Currently, the cluster health is green but it is still in GC (garbage collection). For that, I shared the logs in my first comment.

Ok it's not really clear to me what is happening here sorry.

  • Your logs do not show that Elasticsearch is down at all
  • You get a response from the API during GC
  • I don't know the details of how init.d checks if the service is up, but if the API is working and you can see the process in a ps, then I think it's ok

Okay let me clear the scenario, please find my inline responses,

  • Elasticsearch is able to only show the GC logs.
  • Yes, I got the response during the GC state.
  • The elasticsearch is installed as service, hence I am able to check the elasticsearch status using init.d

Below is my problem. for which I am finding the solution,

The ES goes into GC mode frequently, I need to solve this GC problem. My final aim is to prevent the ES from GC.

Is there any setting that I need to configure in ES to prevent from the GC.

Elasticsearch Current Logs indicate GC state:

[2020-04-22T07:30:17,924][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][94883] overhead, spent [334ms] collecting in the last [1s]
[2020-04-22T07:32:09,993][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][94995] overhead, spent [262ms] collecting in the last [1s]
[2020-04-22T08:00:23,866][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96688] overhead, spent [383ms] collecting in the last [1s]
[2020-04-22T08:00:51,894][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96716] overhead, spent [257ms] collecting in the last [1s]
[2020-04-22T08:01:00,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96725] overhead, spent [296ms] collecting in the last [1s]
[2020-04-22T08:01:01,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96726] overhead, spent [349ms] collecting in the last [1s]
[2020-04-22T08:01:02,896][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][96727] overhead, spent [332ms] collecting in the last [1s]
[2020-04-22T08:20:23,717][INFO ][o.e.c.m.MetaDataMappingService] [ykmWSiI] [dsdb-20200422/K5EiSEzESiuuClJobrZe3Q] update_mapping [evt]
[2020-04-22T08:30:16,736][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][98480] overhead, spent [318ms] collecting in the last [1s]
[2020-04-22T08:32:16,901][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][98600] overhead, spent [380ms] collecting in the last [1s]

You cannot stop Elasticsearch from going GC, it's a totally normal thing.

Unless the GC is causing Elasticsearch to be unresponsive via the API, and is running for multiple seconds, which would be logging warnings, then there's nothing to worry about :slight_smile:

Can you post the full output of the cluster stats API?

How much data do you have per node?

@warkolm
Also, i want to highlight, the API also goes into unresponsive when the ES goes into GC & down.

@warkolm how to recover the Elasticsearch from Garbage Collection.

Means if it is GC mode and I want recover it and bring back it to normal state.

Hi Team,

Can anyone please explain each term from below log,

[2020-04-22T08:32:16,901][INFO ][o.e.m.j.JvmGcMonitorService] [ykmWSiI] [gc][98600] overhead, spent [380ms] collecting in the last [1s]

Can you please provide the information I asked for earlier? This will help us get a better understanding of the situation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.