ECK elasticsearch pod failing to start since it couldn't load persistent cache

Hi everyone!

I have an ECK operator managed ES cluster of 3 nodes which I am trying to spin up with new persistent volume claims that are using old persistent volumes from a previous ES cluster.

In order to do this I deployed an elasticsearch manifest with a volume claim template specifying 1Tib PVCs for each node. Once the cluster was running I then deleted the newly created PVCs and their PVs in order to create PVCs that are connected to the old PVs. Once my new PVCs were bound to the old PVs I rebooted each ES node to pick up the new data but node-2, the last node, fails to start.

Node-2 has the following error:

java.io.UncheckedIOException: Failed to load persistent cache
Likely root cause: java.io.IOException: No space left on device
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:285)
	at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
	at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
	at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
	at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
	at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
	at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
	at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
	at org.apache.lucene.store.OutputStreamIndexOutput.getChecksum(OutputStreamIndexOutput.java:80)
	at org.apache.lucene.codecs.CodecUtil.writeCRC(CodecUtil.java:569)
	at org.apache.lucene.codecs.CodecUtil.writeFooter(CodecUtil.java:393)
	at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:582)
	at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:485)
	at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:803)
	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5088)
	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3461)
	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3771)
	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3729)
	at org.elasticsearch.xpack.searchablesnapshots.cache.full.PersistentCache$CacheIndexWriter.commit(PersistentCache.java:524)
	at org.elasticsearch.xpack.searchablesnapshots.cache.full.PersistentCache.repopulateCache(PersistentCache.java:261)
	at org.elasticsearch.xpack.searchablesnapshots.cache.full.CacheService.doStart(CacheService.java:201)
	at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:48)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1087)
	at org.elasticsearch.node.Node.start(Node.java:801)
	at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:311)
	<<<truncated>>>

I'm not sure what no more space left on device means exactly but the persistent disk allocated to it has 1TB which is 500x more space then it originally had in the old ES cluster. Additionally the persistent volume in k8s is set to 1Tib:

spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Ti
  storageClassName: ssd
  volumeMode: Filesystem
  volumeName: pvc-9dd48c27-4713-4453-a5e9-9334b0fd0e28
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Ti
  phase: Bound

and the corresponding PVC is set to 1Tib too

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: elasticsearch-prod-backup
    elasticsearch.k8s.elastic.co/statefulset-name: elasticsearch-prod-backup-es-default
  name: elasticsearch-data-elasticsearch-prod-backup-es-default-1
  namespace: eck-prod-backup
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Ti
  storageClassName: ssd
  volumeMode: Filesystem
  volumeName: pvc-3fc7c42b-6d1a-4f2b-aa69-eb6d67aee425

I'm thinking it's possible I may have missed some ECK operator related metadata such as a label/annotation in the above PVC claim which would cause node-2 to not find it, thus saying no space is available. But that's just a thought

The other 2 ES pods (node-0 and node-1) loaded the PVC fine and are running, though they error with the following message

"stacktrace": ["org.elasticsearch.xpack.monitoring.exporter.ExportException: UnavailableShardsException[[.monitoring-kibana-7-2021.07.05][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-kibana-7-2021.07.05][0]] containing [2] requests]]"

which I'm assuming is because node-2 is failing to start.

I would greatly appreciate any ideas on this!

Thank you for reading all of this, it was a lengthy post!

I have since been able to redeploy the ES cluster without having to mess the PVCs it creates by preparing the OLD PVs in such a way that they are now bound automatically to the new PVCs on deployment.

Yet despite this I am still seeing the same error

java.io.UncheckedIOException: Failed to load persistent cache
Likely root cause: java.io.IOException: No space left on device

So I really have no idea what is wrong with it now since I can rule out any metadata being left out in my previous attempt

I found a log in the start up logs of the failing node which says

{"type": "server", "timestamp": "2021-07-05T14:52:48,433Z", "level": "INFO", "component": "o.e.e.NodeEnvironment", "cluster.name": "elasticsearch-prod-backup", "node.name": "elasticsearch-prod-backup-es-default-2", "message": "using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sdb)]], net usable_space [0b], net total_space [1.9gb], types [ext4]" }

It interesting because it says it has 0b out of a total 1.9gb available. This makes no sense to me as persistent disk in GCP is 1024GB, the elasticsearch manifest requests 1Tib, the PVC and PV both request 1Tib

is this a bug with the ECK operator perhaps? I really don't know at this point

This looks like something is misconfigured in the way the volume is mounted.
Could you post the entire Elasticsearch resource yaml manifest?

I mounted the volume to another pod so I could check it with df and it showed it was 1.9gb which surprised me. I thought it was 1tb since the persistent disk in gcp was 1tb, along with the PV and PVC. I'm not sure how to increase it however.

Could you post the entire Elasticsearch resource yaml manifest so we make sure there is no inconsistencies in the way you declare your volumes?

Sure

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-prod-backup
  namespace: eck-prod-backup
spec:
  version: 7.12.1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
    - name: default
      count: 3
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadOnlyMany
            resources:
              requests:
                storage: 1Ti
            storageClassName: ssd
      config:
        node.master: true
        node.data: true
        node.ingest: true
      podTemplate:
        spec:
          initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"]
          containers:
            - name: elasticsearch
              # specify resource limits and requests
              resources:
                requests:
                  memory: 2Gi
                  cpu: 1.5
          tolerations:
            - key: service
              operator: Equal
              value: elasticsearch-backup
              effect: NoSchedule
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: service
                        operator: In
                        values:
                          - elasticsearch-backup

The PVCs are generated automatically from the volume claim template but I've stripped them back:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   labels:
     common.k8s.elastic.co/type: elasticsearch
     elasticsearch.k8s.elastic.co/cluster-name: elasticsearch-prod-backup
     elasticsearch.k8s.elastic.co/statefulset-name: elasticsearch-prod-backup-es-default
  name: elasticsearch-data-elasticsearch-prod-backup-es-default-1-new
  namespace: eck-prod-backup
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Ti
  storageClassName: ssd
  volumeMode: Filesystem

and the PV

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
  creationTimestamp: "2021-06-03T14:29:26Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: us-west1
    failure-domain.beta.kubernetes.io/zone: us-west1-c
    name: pvc-0112dc42-48d0-46e5-be18-71f4bff3ac9d
  resourceVersion: "50549753"
  selfLink: /api/v1/persistentvolumes/pvc-0112dc42-48d0-46e5-be18-71f4bff3ac9d
  uid: a5f00f53-274f-475b-95d7-3d6bd44c758a
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Ti
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: elasticsearch-data-elasticsearch-prod-backup-es-default-0
    namespace: eck-prod-backup
    resourceVersion: "50424988"
    uid: a8e4ec88-86fe-480f-9428-338bde5233b9
  gcePersistentDisk:
    fsType: ext4
    pdName: pvc-adb7bc67--pvc-0112dc42-48d0-46e5-be18-71f4bff3ac9d
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-west1-c
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - us-west1
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ssd
  volumeMode: Filesystem

Thanks. This looks correct.

I mounted the volume to another pod so I could check it with df and it showed it was 1.9gb which surprised me.

:frowning:
this looks like an issue on GCP side?

My theory is the original storage class config causing the issue. To begin with the storage class didn't allow for volume expansion when the original persistent volumes and claims got dynamically made.

Some time later I hit a disk pressure alarm and tried to resize the PVs by updating the storage class and then resizing the PVs which I thought worked but I evidently did not!

The fix was to create new persistent disks and volumes, copy over the data to the new ones and then load ES up with the new PVs! That was quite a horrible experience