Garbage Collection blackout

gtorrance · April 7, 2017, 6:07pm

Hi all,

I have set up a ELK server (mostly using defaults) on a VM with 2GB of memory and 200GB of disk. It has been been loading about 250 log files from another server using Filebeat. (The files are only about 300KB each.)

Performance has been really terrible, though, and log files are showing numerous errors.

Also -- (and the reason for this message) -- I've been seeing extended time periods where the ES server appears to almost completely black out while doing GC. Notice the logs below, which seem to indicate an 8.8 minute period of GC. (I've see other occurrences of this with over 20 minutes of GC.)

Below are two images from X-Pack monitoring (for the same period). Notice the 8 minutes of nothingness.

This isn't healthy, right? Thoughts?

Thanks,
Greg

[2017-04-07T13:34:15,260][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5157] overhead, spent [1s] collecting in the last [1.1s]
[2017-04-07T13:34:17,931][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][young][5159][3899] duration [1.5s], collections [1]/[1.6s], total [1.5s]/[29.2m], memory [1.9gb]->[1.9gb]/[1.9gb], all_pools {[young] [62mb]->[514kb]/[66.5mb]}{[survivor] [8.3mb]->[8.3mb]/[8.3mb]}{[old] [1.9gb]->[1.9gb]/[1.9gb]}
[2017-04-07T13:34:17,931][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5159] overhead, spent [1.5s] collecting in the last [1.6s]
[2017-04-07T13:34:25,167][INFO ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5166] overhead, spent [498ms] collecting in the last [1.2s]
[2017-04-07T13:34:26,169][INFO ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5167] overhead, spent [377ms] collecting in the last [1s]
[2017-04-07T13:34:30,170][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5171] overhead, spent [590ms] collecting in the last [1s]

*** blackout for 8 minutes ***

[2017-04-07T13:43:24,673][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][old][5175][12] duration [8.8m], collections [2]/[8.8m], total [8.8m]/[11.1m], memory [1.9gb]->[679mb]/[1.9gb], all_pools {[young] [16.5mb]->[958.5kb]/[66.5mb]}{[survivor] [7mb]->[0b]/[8.3mb]}{[old] [1.9gb]->[678.8mb]/[1.9gb]}
[2017-04-07T13:43:24,765][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5175] overhead, spent [8.8m] collecting in the last [8.8m]
[2017-04-07T13:43:28,693][INFO ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][young][5178][3908] duration [888ms], collections [1]/[1.9s], total [888ms]/[29.3m], memory [718.1mb]->[687.5mb]/[1.9gb], all_pools {[young] [39.3mb]->[36.2kb]/[66.5mb]}{[survivor] [0b]->[8.3mb]/[8.3mb]}{[old] [678.8mb]->[679.1mb]/[1.9gb]}

nik9000 · April 7, 2017, 6:31pm

Yeah, that isn't healthy. This is your smoking gun:

[2017-04-07T13:43:24,765][WARN ][o.e.m.j.JvmGcMonitorService] [FBp7aLX] [gc][5175] overhead, spent [8.8m] collecting in the last [8.8m]

A couple of things:

If you have a heap of 2GB on a machine with 2GB of ram then you aren't going to get any disk caching so your performance is going to be terrible. We recommend no more than half the RAM being used for the heap in general.
That recommendation is trouble given that you can't run what you have with 2GB, much less 1GB. It loks like you have 1,205 shards which is quite a bit. For small indexes you should declare them as 1 shard. I'd try lower that number first and then investigate further.

We've talked about having a test for the number of shards on a node based on memory but never done it. Situations like yours are a good reason to have it though.

gtorrance · April 10, 2017, 3:52pm

Thanks Nic!

I increased the VM memory to 8GB and everything seems to be running smoothly now. Also, it looks like disk space allocated to /var/lib/elasticsearch was running low, so I increased that, too. No longer seeing any errors in elasticsearch.log.

I appreciate the help!

Greg

system · May 8, 2017, 4:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Main problem with garbage collector Elasticsearch	9	1178	August 2, 2021
Gc overhead, spent [] collecting in the last [], causing crashes Elasticsearch	22	19878	April 20, 2020
Lots of Garbage collection logs in elasticsearch 5.6 Elasticsearch	12	1477	May 13, 2020
Performance weird stuff Elasticsearch	13	875	September 25, 2020
ES 2.3 -> 5.2 Memory Issues Elasticsearch	5	833	March 24, 2017

Garbage Collection blackout

Related topics