ECK Sudden rise in data disk usage

Deployed elastic search Kubernetes in GKE. With 2GB memory and 1GB persistence disk.

We got an error out of storage exception. After that, we have Increased to 2GB on the next day itself it reached 2GB, but we haven’t run any big queries. Then again we have increased the persistence disk size to 10 GB. After that, there is no increase in the data persistence disk storage.

On further analysis, we have found total Indices take 20MB of memory unable to what are the data in the disk.

Used elastic search nodes stats API to get the details on disk and node statistics.

I am unable to find the exact reason why memory exceeds and what are the data in the disk. Also, suggest ways to prevent this future.

Judging from your screenshot I suspect your instance ran out of memory a few times and a JVM heap dump was taken. This is because Elasticsearch runs by default with the following options -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data which mean that it will take a heap dump when running out of memory and it will store it in the data directory. With only 1G of disk space and 2G of memory it takes only one of these heap dumps to fill up your data directory.

How to avoid this problem? There are multiple approaches:

  • Make sure there is enough space on disk for the heap dumps or define an alternative path for the heap dumps e.g. on a emptyDir volume and configure said path via -XX:HeapDumpPath
  • Turn off heap dump creation in on out of memory errors e.g. with something like
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.9.2
  nodeSets:
  - name: default
    count: 1
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -XX:-HeapDumpOnOutOfMemoryError
  • Make sure your Elasticsearch instance does not run out of memory to begin with by giving it more memory as described in the ECK docs
1 Like

@pebrc thanks for the reply.
What is the purpose of this heap dump and is it good to disable it?

It is a debugging tool that can allow you to do a post-mortem analysis of a JVM process that (like Elasticsearch) to answer questions like "what did take up all the memory that lead to the OOM event" or similar.

Is it good to disable it? It depends. If you disable this setting and your Elasticsearch instance runs out of memory you will not be able to get a heap dump after the fact. You can then of course re-instate the setting and capture the heap dump the next time round if the out of memory situation happens repeatedly.

In general out of memory situations should be less likely to happen on recent versions of Elasticsearch due to additional safe guards like circuit breakers that have been added to Elasticsearch to prevent operations that would use more than the available amount of memory.

Hope that helps.