35 shards but maxing out JVM heap


#1

Hi,

I've been experiencing long term issues with Elasticsearch and one of my 3 node setups running out of heap space (the specific node varies each time it seems). The specifics of my setup are as follows:

  • OS: Windows 2012 R2
  • ES version: 6.2.1
  • Logstash version: 6.2.1
  • Kibana version: 6.2.1
  • Winlogbeat version: 6.2.1
  • Number of nodes: ES: 3 - LS: 1 - KB: 1
  • JVM Heap size on each ES node: 4GB - Total: 12GB
  • Number of shards per index: 3 (1 primary, 2 replica)
  • Index timescale: 4 indices per day.
  • Total shards per day: 12
  • Retention period before curating: 1 week (total shards per week: 84). Currently the number of shards doesn't grow beyond this before being deleted. In time I will increase this time frame.
  • Java version on ES nodes: 1.8.0_144-b01
  • Java version on LS: 1.8.0_151-b12

Despite setting all the index templates to 1 primary shard & 2 replicas (in the hope this would reduce the load on the JVM heap), this hasn't happened. The heap is still climbing so high one usually one ES node (varies as to which one), the Garbage Collection is timing out before eventually bringing down the node itself. The only way to get things working again is to restart the host or to force the cluster routing to enable as shown below:

curl -XPUT 'es-01.myserver.com:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
'

The daily indices I am building are tiny, ranging from 300Kb to no more than 3GB. I've disabled all dashboards in Kibana so nothing is pulling data from the nodes. Still JVM heap space is climbing and errors such as:

[2018-02-26T11:10:09,608][INFO ][o.e.m.j.JvmGcMonitorService] [es-01] [gc][5867] overhead, spent [351ms] collecting in the last [1s]

are starting to fill the ES nodes logs as it slowly gets more overloaded.

I'm at a loss now as to explain why. I was pretty confident that the reduction of shards per index would resolve the issue but it seems not so if anyone can suggest anything else I can try, I'd appreciate the help.

Thanks.


(Aaron Mildenstein) #2

How many shards per node do you have? You've told me how many you have per week. How many total shards do you have per node?

With only 4G heap, the number of shards per node you can manage will be surprisingly small. A 31G heap can safely address 600 open shards per node. You might only be able to handle as many as 75 open shards per node.

The bottom line is that there is a heap overhead for each open shard. Exceeding the threshold increases memory pressure on your nodes (frequent garbage collection is a symptom). You should see the heap usage between 30% and 75% at all times. If it is constantly hanging around 70% usage—and not freeing up enough memory to drop down to at least 50% usage after a GC—then that's indicative of memory pressure, which might result in the symptoms you're seeing. 4G heaps are quite small for a production system.


#3

Hi Aaron,

Thanks for your reply. Shards per node currently sit at 15. Over the course of one week they would grow to 35 per node (5 indices currently split across 7 days before curating).

I'm now running with 8GB per node, 24GB across the cluster.Still, the ES logs are full of:

[2018-02-27T13:30:53,147][INFO ][o.e.m.j.JvmGcMonitorService] [es-01-srv123] [gc][1897] overhead, spent [524ms] collecting in the last [1.1s]

By your memory calculations, about 7.75GB should be enough for 150 active shards. As I'm running with 8GB now, I can't figure out what's left to test. Should I throw even more memory at it? My issue is that I ran this cluster for months with a heap of 1GB on each node. It ran to over 100 shards and no issues. It's only within the last month or so this has become an issue which has prompted me to start again, with a smaller shard allocation setup in the hope it was the 1117 shards I'd ended up with that was the issue.


(Aaron Mildenstein) #4

Look at the JVM results on each box over time (plotted in a graph, preferably, like in the Kibana/X-Pack Monitoring page). What do you see your heap used rate at? You should see the maximum usage be no more than 75% of the assigned memory, and when it hits that 75%, it should perform a full GC and reduce the memory down to at least 50% free available heap. If it doesn't (and this is my guess), and it only frees memory to something like 70%, or worse, that it gets to 80%+ usage, and can't free up more than 5% - 10%, then you definitely have memory pressure. Since you're getting those messages in the logs, you're doing GC nearly constantly, which is why I am guessing those numbers.

Some increase in that heap usage is leading to the GCs is my guess.


(Christian Dahlqvist) #5

What is the full output of the cluster stats API?


#6

Hi Christian,

As requested, shown below. Currently I've pulled all the ES configs from the nodes and rebuilt them. I'm watching one of my nodes sitting on 74% (4.4GB used of 5.9GB available and it's been running for 1 hour. The other 2 nodes site on 18% mem use and 23%. I'm seeing the same garbage collection info in the logs. However, the only difference that I've just witnessed is the the heap has dropped from 75% to 7%. Assuming the gc did it's job but it leaves the question of should gc be running only into 1 hour of the node coming up?

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "ES-Cluster",
  "timestamp": 1519828375107,
  "status": "green",
  "indices": {
    "count": 20,
    "shards": {
      "total": 59,
      "primaries": 20,
      "replication": 1.95,
      "index": {
        "shards": {
          "min": 2,
          "max": 3,
          "avg": 2.95
        },
        "primaries": {
          "min": 1,
          "max": 1,
          "avg": 1
        },
        "replication": {
          "min": 1,
          "max": 2,
          "avg": 1.95
        }
      }
    },
    "docs": {
      "count": 15517262,
      "deleted": 9
    },
    "store": {
      "size": "34gb",
      "size_in_bytes": 36524282513
    },
    "fielddata": {
      "memory_size": "1.3mb",
      "memory_size_in_bytes": 1392200,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "5.1mb",
      "memory_size_in_bytes": 5392376,
      "total_count": 3045,
      "hit_count": 728,
      "miss_count": 2317,
      "cache_size": 100,
      "cache_count": 186,
      "evictions": 86
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 557,
      "memory": "78.7mb",
      "memory_in_bytes": 82557984,
      "terms_memory": "59mb",
      "terms_memory_in_bytes": 61925462,
      "stored_fields_memory": "12.5mb",
      "stored_fields_memory_in_bytes": 13183152,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "1.5mb",
      "norms_memory_in_bytes": 1663616,
      "points_memory": "3.7mb",
      "points_memory_in_bytes": 3970686,
      "doc_values_memory": "1.7mb",
      "doc_values_memory_in_bytes": 1815068,
      "index_writer_memory": "43.6mb",
      "index_writer_memory_in_bytes": 45740812,
      "version_map_memory": "0b",
      "version_map_memory_in_bytes": 0,
      "fixed_bit_set": "0b",
      "fixed_bit_set_memory_in_bytes": 0,
      "max_unsafe_auto_id_timestamp": 1519823360776,
      "file_sizes": {}
    }
  },
  "nodes": {
    "count": {
      "total": 3,
      "data": 3,
      "coordinating_only": 0,
      "master": 3,
      "ingest": 3
    },
    "versions": [
      "6.2.1"
    ],
    "os": {
      "available_processors": 12,
      "allocated_processors": 12,
      "names": [
        {
          "name": "Windows Server 2012 R2",
          "count": 3
        }
      ],
      "mem": {
        "total": "35.9gb",
        "total_in_bytes": 38652887040,
        "free": "10.9gb",
        "free_in_bytes": 11714117632,
        "used": "25gb",
        "used_in_bytes": 26938769408,
        "free_percent": 30,
        "used_percent": 70
      }
    },
    "process": {
      "cpu": {
        "percent": 31
      },
      "open_file_descriptors": {
        "min": -1,
        "max": -1,
        "avg": 0
      }
    },
    "jvm": {
      "max_uptime": "1.8h",
      "max_uptime_in_millis": 6547528,
      "versions": [
        {
          "version": "1.8.0_144",
          "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version": "25.144-b01",
          "vm_vendor": "Oracle Corporation",
          "count": 3
        }
      ],
      "mem": {
        "heap_used": "6.5gb",
        "heap_used_in_bytes": 6994533952,
        "heap_max": "17.9gb",
        "heap_max_in_bytes": 19222757376
      },
      "threads": 199
    },
    "fs": {
      "total": "299.9gb",
      "total_in_bytes": 322113097728,
      "free": "263.7gb",
      "free_in_bytes": 283169873920,
      "available": "263.7gb",
      "available_in_bytes": 283169873920
    },
    "plugins": [],
    "network_types": {
      "transport_types": {
        "netty4": 3
      },
      "http_types": {
        "netty4": 3
      }
    }
  }
}

#7

So I'm back to the gc warnings. I've not nothing left to try to make ES stand up longer than 3 hours.

All I have left now is to wipe out the whole cluster and start from scratch. The problem with that is that all I'm going to do is rebuild the server and use the config I have to set up a new one. Essentially, just moving whatever the issue is, onto a shiny new box. If anyone has any suggestions as to why ES ran solidly for 6 months on 1GB heap and well in excess of 500 shards (V5.X.X), yet now, having no more than one additional index added to it, can't stand up for more than 3 hours, I'd love to hear them. We need to launch this as a prod cluster soon and in it's current state, it's unusable.

Thanks.


(Christian Dahlqvist) #8

Heap usage will go up to 75% before GC kicks in and as long as heap usage drops significantly at that point you should have a healthy heap usage pattern. The stats you have provided look fine.

What type of queries are you running against the cluster? Are you using ingest pipelines or scripts?


#9

Hi Christian,

Leaving the cluster overnight results in 2 of the 3 nodes crashing (in terms of the ES service stopping) along with logs regarding the garbage collection taking a long time. My understanding is that I shouldn't see the messages at all as ES will only log them if the process is taking longer than it should. That might not be correct however(?)

As far as inputs go, nothing different to what I've always run. FW logs are pushed to LS as a syslog, filtered and sent on to ES, 2 winlogbeat inputs to LS, one input reading from a simple log file, and one taking the result of a PS script run on a Windows server and output as a syslog to LS.

I'm using Kibana visulisations to build simple dashboards. My thoughts on that were that I might be querying the data too often and thus causing too much load on the heap. Today, I've shut down every machine I had that was pulling kibana dashboards regularly so I can perhaps narrow this down to it being the cause. The real issue I keep coming back to is that I didn't have this issue for months with a far more limited setup. Could you advise on whether it is possible that a badly formatted Logstash input config could cause the heap to become used up so quickly? This is my next port of call if I wak tomorrow to find the cluster in the same state :slight_smile:

Thanks again for your continued help!


(Aaron Mildenstein) #10

How big a time slice are you querying? You may be better off sticking with 15 minutes, or 1 hour. If you try to query a full 30 days on such a small cluster, it could indeed cause issues.


(andy_zhou) #11

change the JVM go to 31G and try again?


#12

@theuntergeek - Apologies for the late reply. I actually came to the same conclusion after exhausting everything else. I scaled back the Dashboards to once every 15 mins a week ago and the cluster is now stable. It's a lesson learnt. I went searching for something more complex as everything pointed at it being a resource limitation in terms of build. Should have checked the obvious first I guess. I just didn't think I'd be putting so much load on the cluster with a 5 minute refresh on some dashboards.

Thanks to each of you for your help and suggestions. Hopefully this post will serve someone else in the same situation from having the same issues.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.