How to rightsize HEAP , long GC , ES 2.4

Hi There

I have recently deployed 2 separate ES 2.4 Clusters , 2 nodes in each cluster , running on Windows 2012 R2 standard , Virtual Machines on vmware , 12 Cores , 24 GB RAM , 1 TB disk. I am using Java 1.8 , update 102.

The clusters are index heavy , meaning , they are receiving around 300-400 million entries per day. Logs are being sent from Filebeat to Logstash , which sends the logs to ES. Indexing rate is around 3000-7000 a second.

I have reduced the shard count from 5 to 2 in my index and I am creating a new index every day.

My main problem is what is the correct approach to determine the correct ES heap size ??

I started out with heap size of around 8 GB. That let the cluster run for approximately 1 day and then it ran into a long GC old , which took several minutes. And I experience error message , that master has left.

I am now at a heap size of 16 GB , which lets it run for a few days , before again being hit by a long gc old. This night , I saw a GC old taking a whopping 42 min. I see a heap usage slowly increasing over time. Like some kind of memory leak.

So then I tried to make a script to monitor the heap and when it reaches 90% or more , I will gracefully restart the node , but I dont think this is the correct approach.

I would like to think , that ES is mature enough to not suffer from this problem.

So my question , what is the correct way of sizing the heap to avoid long gc ? What other settings , should I be concerned about in a cluster being index heavy ? I have done a lot of googling ofc , but I need a little help :wink:

We would like to add more logs into the cluster and add more nodes , but I need the clusters to be more stable before this is happening .

Regards

1 Like

How much data in the cluster, how many indices, how many shards.
What does your mapping look like, do you have lots of fielddata taking up heap?

Hi Mark

I have 2 daily indices. I have one index for my dnslogs and one index for winlogbeat , aka windows eventlogs.

The dns index is around 300 million per day , 100 GB , 2 shards , 1 replica , the winlogbeat is around 25 million , 40 GB , 5 shards , 1 replica.

I prune all indexes older than 8 days.

If I monitor with marvel , fielddata graph is consistently 0.

Regards

What sort of queries are you doing against it?

Hi

Just very simple queries through kibana 4.6. Nothing fancy.

Like who queried a given dns domain. What computers have a given event_id ? very basic stuff.

And very few queries through the day. All queries respond fine and smooth.

Regards

What does your node configuration look like?

Hi

What exactly would you like to know ?

Regards

Anything that is not default.

Anything that is not default ...

Well , I have changed logstash parameters to increase indexing performance.

-w 24 -b 3048

workers=24 , batchsize = 3048

I have increased output parameters in logstash.json

workers = 2, flush_size = 3048

These are settings from elasticsearch.yml

Indexing Settings for Writes

index.refresh_interval: 30s

path.data: d:\data
path.logs: c:\logs

cluster.routing.allocation.cluster_concurrent_rebalance: 4
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 6
indices.recovery.max_bytes_per_sec: 125mb
indices.recovery.concurrent_streams: 5
indices.memory.index_buffer_size: 30%

Hi

We had another long GC , which caused the Cluster to fail.

[2016-09-19 10:24:52,142][WARN ][monitor.jvm ] [node1] [gc][young][111431][3834] duration [2.9s], collections [1]/[3.5s], total [2.9s]/[3.2m], memory [13.7gb]->[9.3gb]/[19.3gb], all_pools {[young] [4.9gb]->[108.1mb]/[5.3gb]}{[survivor] [606.8mb]->[682.6mb]/[682.6mb]}{[old] [8.1gb]->[8.6gb]/[13.3gb]}
[2016-09-19 10:25:04,680][INFO ][monitor.jvm ] [node1] [gc][young][111443][3835] duration [908ms], collections [1]/[1.2s], total [908ms]/[3.2m], memory [14.5gb]->[9.7gb]/[19.3gb], all_pools {[young] [5.2gb]->[113mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [8.6gb]->[8.9gb]/[13.3gb]}
[2016-09-19 10:25:21,951][WARN ][monitor.jvm ] [node1] [gc][young][111457][3836] duration [2.9s], collections [1]/[4s], total [2.9s]/[3.3m], memory [14.2gb]->[10.5gb]/[19.3gb], all_pools {[young] [4.5gb]->[87.1mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [8.9gb]->[9.8gb]/[13.3gb]}
[2016-09-19 10:25:35,434][WARN ][monitor.jvm ] [node1] [gc][young][111469][3837] duration [1.8s], collections [1]/[2.3s], total [1.8s]/[3.3m], memory [15.5gb]->[11gb]/[19.3gb], all_pools {[young] [5gb]->[41.2mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [9.8gb]->[10.3gb]/[13.3gb]}
[2016-09-19 10:25:49,194][WARN ][monitor.jvm ] [node1] [gc][young][111481][3838] duration [2.3s], collections [1]/[2.5s], total [2.3s]/[3.3m], memory [16.2gb]->[12gb]/[19.3gb], all_pools {[young] [5.2gb]->[34.1mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [10.3gb]->[11.3gb]/[13.3gb]}
[2016-09-19 10:26:01,727][WARN ][monitor.jvm ] [node1] [gc][young][111491][3839] duration [2.2s], collections [1]/[3.1s], total [2.2s]/[3.4m], memory [16.4gb]->[12.6gb]/[19.3gb], all_pools {[young] [4.4gb]->[35mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [11.3gb]->[11.9gb]/[13.3gb]}
[2016-09-19 10:26:13,914][INFO ][monitor.jvm ] [node1] [gc][young][111503][3840] duration [762ms], collections [1]/[1s], total [762ms]/[3.4m], memory [17.8gb]->[12.9gb]/[19.3gb], all_pools {[young] [5.1gb]->[33.4mb]/[5.3gb]}{[survivor] [682.6mb]->[682.6mb]/[682.6mb]}{[old] [11.9gb]->[12.2gb]/[13.3gb]}
[2016-09-19 10:26:27,011][WARN ][monitor.jvm ] [node1] [gc][young][111515][3841] duration [1.1s], collections [1]/[1.9s], total [1.1s]/[3.4m], memory [17.7gb]->[13.2gb]/[19.3gb], all_pools {[young] [4.7gb]->[2.8mb]/[5.3gb]}{[survivor] [682.6mb]->[655.2mb]/[682.6mb]}{[old] [12.2gb]->[12.5gb]/[13.3gb]}
[2016-09-19 10:32:45,761][WARN ][monitor.jvm ] [node1] [gc][old][111536][3] duration [5.9m], collections [1]/[5.9m], total [5.9m]/[5.9m], memory [18.5gb]->[5gb]/[19.3gb], all_pools {[young] [5.2gb]->[108.8mb]/[5.3gb]}{[survivor] [655.2mb]->[0b]/[682.6mb]}{[old] [12.5gb]->[4.9gb]/[13.3gb]}

this time it took 5.9 minutes

Regards

That is indeed a very long GC. Do you perhaps have swap enabled or is the host overcommitted with respect to memory?

Hi

The same behaviour occurs on 2 separate ES clusters.

The first cluster has 24 GB RAM on each node and second cluster has 32 GB RAM.

I have changed the configuration today.

  1. bootstrap.mlockall: true
  2. Memory reservation on VMWare

So need to wait till I see the effect of these settings. If any ...

I can provide a JVM heap dump , if anyone wants to take a look.

Regards

Having ballooning memory is bad.

1 Like

Well , I found the root cause for this problem.

It turned to be the Garbage Collector used in ES 2.4.0. I installed the new 2.4.1 update of ES on my clusters and finally I experience the correct sawtooth pattern in heap usage. Climbs to 75% and then drops like expected.

I have to admit that this bug is so bad and so embarrasing for the quality of ES on Windows , that I am very surprised , that this can go unnoticed for so long.

I would like some comment from ES on how such a fatal bug is able to pass your quality test ? Dont you have any Windows Clusters running ES ? Why is ES able to run at all with the serial garbage collector , without a giant warning in the logs ? Better yet , it should refuse to run at all , with the serial garbage collector enabled.

Regards

Why is ES able to run at all with the serial garbage collector , without a giant warning in the logs ? Better yet , it should refuse to run at all , with the serial garbage collector enabled.

We agree. See Add serial collector bootstrap check by jasontedor · Pull Request #20558 · elastic/elasticsearch · GitHub

I would like some comment from ES on how such a fatal bug is able to pass your quality test ?

Oversight. If you configure a particular garbage collector in your application, do you have a check to ensure that the JVM is using that GC collector? Probably not. Turns out we should. Live and learn