Overhead and heap issues

Callahan · January 31, 2018, 8:01pm

Hi all,

I'm having big problems with Java Heap size and my 3 Elastic nodes running out of space. I've been running with the default 1gig setup but as that started to fill, I've increased it to 4gig (half of the available memory on the server), but I'm getting more issues now that I've made the adjustment. The cluster will run for about half a day before filling the Java heap space, eventually timing out and then stopping altogether.
I'm running these 3 nodes on Windows 2012 R2, have used the elasticsearch-service.bat manager to adjust the -Xms6g and -Xmx6g options as well as changing the "Initial memory pool" and the "Maximum memory pool" to 6144MB. I also changed the java.options file just for good measure (I realise this file is only used by Linux based systems) yet still my logs are full of overhead errors before finally quitting.

[2018-01-31T18:30:21,491][INFO ][o.e.m.j.JvmGcMonitorService] [server1] [gc][6178] overhead, spent [382ms] collecting in the last [1s]
[2018-01-31T18:30:22,507][WARN ][o.e.m.j.JvmGcMonitorService] [server1] [gc][6179] overhead, spent [539ms] collecting in the last [1s]
[2018-01-31T18:35:13,640][INFO ][o.e.m.j.JvmGcMonitorService] [server1] [gc][6467] overhead, spent [361ms] collecting in the last [1s]
[2018-01-31T18:35:14,656][INFO ][o.e.m.j.JvmGcMonitorService] [server1] [gc][6468] overhead, spent [503ms] collecting in the last [1s]
[2018-01-31T18:35:15,672][INFO ][o.e.m.j.JvmGcMonitorService] [server1] [gc][6469] overhead, spent [427ms] collecting in the last [1s]

Can anyone point out anything that I might be missing? None of my configs have changed in any significant way, all I've done is increase the heap space for Java.

Thanks for any help you can offer.

warkolm · January 31, 2018, 9:56pm

How many shards do you have? What version are you on?

Callahan · February 1, 2018, 9:25am

V6.0.0
active_primary_shards: 866,
active_shards: 1732

Is the first step an upgrade?

Christian_Dahlqvist · February 1, 2018, 9:34am

I think you have too many shards given the size of the cluster and heap space available. Read this blog post around shards and sharding practices.

Callahan · February 13, 2018, 7:02pm

So I've used Cerebro to get a better overview of the issue. I can now watch the JVM heap size on each node slowly increasing to sometimes 90% on one of the nodes. I can see that I have 42,087,876 docs, 1782 shards spread across 3 nodes and a total size of 58GB, each with now 4GB of JVM heap. Based on the above, can you confirm that the reason I am seeing all these timeouts is down to the sheer number of shards/docs?

If this is the case, what are the steps to reduce them? I only have 179 indices. Is it a config issue that I've screwed up on the initial build?

Thanks.

warkolm · February 13, 2018, 7:25pm

It certainly looks that way. If you have time based data, use _shrink to reduce the counts for older indices.

Callahan · February 14, 2018, 7:07pm

Lets say I wanted to just start again. Wipe out all my indices and let Logstash continue throwing data at ES and just accept my losses. How do I prevent this from happening in the future? From what I read, I'll need to specify the shard size in a template in LS. So am I right in thinking that I should design a template for every index? I assumed I should just use the one that Logstash chooses for me based on the content I throw at it. The only specific template I use is one I cobbled together for our Palo firewalls (shown below for ref). I guess I'm at a loss as to understand exactly how 3 nodes can't take the small amount of data I'm pushing at them. I've even reduced the curator cleanup to delete indices older than 7 days so with only 5 indices incoming and no more than 7 days of each of them.

Palo template

{
  "template" : "palo-firewall-traffic*",
  "settings" : {
    "index.refresh_interval" : "5s"
  },
  "mappings" : {
    "_default_" : {
       "_all" : {"enabled" : true},
       "dynamic_templates" : [ {
         "message_field" : {
           "match" : "message",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "text", "norms" : false, "index" : true
           }
         }
       }, {
         "strings" : {
           "match" : "*",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "text", "norms" : false, "index" : true,
               "fields" : {
                 "text" : {"type" : "keyword", "index" : true, "ignore_above" : 256}
               }
           }
         }
       } ],
       "properties" : {
         "@version": { "type": "keyword", "index": true},
         "geoip"  : {
           "type" : "object",
             "dynamic": true,
             "properties" : {
               "location" : { "type" : "geo_point" }
             }
         },
         "SourceGeo"  : {
           "type" : "object",
             "dynamic": true,
             "properties" : {
               "location" : { "type" : "geo_point" }
             }
         },
         "DestinationGeo"  : {
           "type" : "object",
             "dynamic": true,
             "properties" : {
               "location" : { "type" : "geo_point" }
             }
         }
       }
    }
  }
}

system · March 14, 2018, 7:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JVM Heap size issue. ElasticSearch stops sometimes due to this error Elasticsearch	11	1314	June 12, 2023
JVM Heap Size Elasticsearch	5	1357	October 19, 2017
35 shards but maxing out JVM heap Elasticsearch	12	4038	April 5, 2018
Elasticsearch 7.1x + Java 11: Possible GC misconfiguration Elasticsearch	2	566	September 30, 2019
Java heap space issues Elasticsearch elastic-stack-monitoring , elastic-stack-alerting	2	833	April 5, 2021

Overhead and heap issues

Related topics