Elasticsearch JvmGcMonitorService overhead

Akaren · January 7, 2019, 2:26pm

Hi, we are having ELK 5.4.3 one node, one daily indice, timeseries data with no changes in them. One indice has around 25 GB data and it's stored in 2 shards.

Few times a day I can see this warning in elasticsearch log:

[2019-01-07T09:35:25,984][WARN ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][young][7635766] 
[3393290] duration [2.2s], collections [1]/[2.6s], total [2.2s]/[1.9d], memory [8.4gb]->[8.4gb]/[12.3gb], 
all_pools {[young] [2.1mb]->[1mb]/[865.3mb]}{[survivor] [99.5mb]->[101.9mb]/[108.1mb]}{[old] 
[8.3gb]->[8.3gb]/[11.4gb]}
[2019-01-07T09:35:25,984][WARN ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][7635766] overhead, 
spent [2.2s] collecting in the last [2.6s]
[2019-01-07T09:38:29,660][WARN ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][old][7635767][784] 
duration [3m], collections [1]/[3m], total [3m]/[1.1h], memory [8.4gb]->[2.5gb]/[12.3gb], all_pools 
{[young] [1mb]->[9.9mb]/[865.3mb]}{[survivor] [101.9mb]->[0b]/[108.1mb]}{[old] [8.3gb]-> 
[2.5gb]/[11.4gb]}
[2019-01-07T09:38:29,660][WARN ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][7635767] overhead, 
spent [3m] collecting in the last [3m]
 [2019-01-07T09:41:36,064][INFO ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][young][7635951] 
[3393561] duration [1.4s], collections [2]/[2.2s], total [1.4s]/[1.9d], memory [8.8gb]->[8.4gb]/[12.3gb], 
all_pools {[young] [398mb]->[4.8mb]/[865.3mb]}{[survivor] [108.1mb]->[64.2mb]/[108.1mb]}{[old] 
[8.3gb]->[8.4gb]/[11.4gb]}
 [2019-01-07T09:41:36,064][WARN ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][7635951] overhead, 
spent [1.4s] collecting in the last [2.2s]
[2019-01-07T10:09:12,518][INFO ][o.e.c.m.MetaDataMappingService] [Bpmy3wE] [exceeded- 
2019.01.07/Op9v4qCiSy2CakCtRlRApA] update_mapping [sbc_event]
[2019-01-07T10:52:15,846][INFO ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][7640173] overhead, 
spent [439ms] collecting in the last [1.2s]
 [2019-01-07T12:13:15,629][INFO ][o.e.c.m.MetaDataDeleteIndexService] [Bpmy3wE] [logstash- 
2019.01.03/pwdQEa96Rai9bXSaxWY5aA] deleting index
[2019-01-07T12:13:32,873][INFO ][o.e.c.m.MetaDataDeleteIndexService] [Bpmy3wE] [exceeded- 
2019.01.04/aE68UAMcR_e8vf1RQHA1sg] deleting index
[2019-01-07T12:13:34,780][INFO ][o.e.c.m.MetaDataDeleteIndexService] [Bpmy3wE] [collectd- 
2019.01.04/AfQVhKohRGmRZnKdnOuU3Q] deleting index
[2019-01-07T15:01:55,541][INFO ][o.e.m.j.JvmGcMonitorService] [Bpmy3wE] [gc][7654993] overhead,      
spent [252ms] collecting in the last [1s]

Elastic (and kibana) is so slow to work with. No problem with disk space, heap size is set to

 -Xmx13g
 -Xms13g

Any idea what else to check? Maybe segmnetation? Elasticsearch can't handle too much data income? Segmentation has default values.

Akaren · January 8, 2019, 11:31am

Sometimes also node dies:

 [2019-01-08T11:40:06,424][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [9tKeRnI] failed to 
 execute on node [9tKeRnIXTZKr4kNLIyol2A]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [9tKeRnI][127.0.0.1:9300] 
[cluster:monitor/nodes/stats[n]] request_id [502158] timed out after [35292ms]
    at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) 
[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
[2019-01-08T11:40:22,619][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [9tKeRnI] not 
accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [9tKeRnIXTZKr4kNLIyol2A]
    at  org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:246) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:160) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:218) [elasticsearch-5.4.3.jar:5.4.3]
    at

org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:933) [elasticsearch-5.4.3.jar:5.4.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadConte
xt.java:569) [elasticsearch-5.4.3.jar:5.4.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [9tKeRnI][127.0.0.1:9300]
[cluster:monitor/nodes/stats[n]] request_id [502158] timed out after [35292ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) ~
[elasticsearch-5.4.3.jar:5.4.3]
... 4 more

Christian_Dahlqvist · January 8, 2019, 11:35am

It looks like your GC is very slow even though you have a quite small heap. Are you by any chance running this on a VM with memory ballooning, resulting in a lot of swapping?

Akaren · January 8, 2019, 11:46am

Yes, it's VM but swapping is disabled

Christian_Dahlqvist · January 8, 2019, 11:57am

Have you checked whether the VM itself is overcommitted with respect to memory? Where are you running this VM?

Akaren · January 8, 2019, 12:16pm

I don't think so

vm.nr_overcommit_hugepages = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

cat /proc/sys/vm/overcommit_memory is zero also.

Any other idea what to check? Hot threads command doesn't return anything or

 1.8% (9.1ms out of 500ms) cpu usage by thread 'elasticsearch[9tKeRnI][bulk][T#4]'
 10/10 snapshots sharing following 39 elements
   sun.nio.ch.FileDispatcherImpl.force0(Native Method)
   sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
   sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388)
   org.elasticsearch.index.translog.Checkpoint.write(Checkpoint.java:131)
   org.elasticsearch.index.translog.TranslogWriter.writeCheckpoint(TranslogWriter.java:312)
   org.elasticsearch.index.translog.TranslogWriter.syncUpTo(TranslogWriter.java:273)
   org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:520)
   org.elasticsearch.index.translog.Translog.ensureSynced(Translog.java:545)
   org.elasticsearch.index.shard.IndexShard$1.write(IndexShard.java:1684)
   org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:107)
  org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcess(AsyncIOProcessor.java:99)
   org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:82)
   org.elasticsearch.index.shard.IndexShard.sync(IndexShard.java:1706)
   org.elasticsearch.action.support.replication.TransportWriteAction$AsyncAfterWriteAction.run(TransportWriteAction.java:307)

Christian_Dahlqvist · January 8, 2019, 12:20pm

Where are you running this VM?

Akaren · January 8, 2019, 12:30pm

VM linux kernel, 29GB memory. It runs on KVM.

Christian_Dahlqvist · January 9, 2019, 7:38am

And you have verified that KVM is not configured to overcommit memory? If so, I am not sure what could cause GC to take so long.

Akaren · January 11, 2019, 9:02am

Yes, KVM memory is not problem but it seems that the disk is too slow for writing.

system · February 8, 2019, 9:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[WARN ][o.e.m.j.JvmGcMonitorService] [gc][212067] overhead Elasticsearch	1	1050	September 20, 2018
Error in jvmgcmonitorservice (gc overhead) Elasticsearch	10	3781	August 12, 2021
"overhead spent" over and over during periods of time Elasticsearch	2	518	November 1, 2018
Elasticsearch overhead all the time Elasticsearch	6	2748	December 14, 2018
Garbage collection log messages, [monitor.jvm ... duration [2.2m] Elasticsearch	8	11054	July 6, 2017

Elasticsearch JvmGcMonitorService overhead

Related topics