Adding a Custom Volume for Backups

Hello,

I'm interested in mounting a shared NFS PersistentVolume across an Elasticsearch ECK cluster so that I can add it has a Snapshot Repository. (This seems to be the obvious solution for on-prem snapshotting)

I've created a deployment (by cribbing from the Synonym configmap example) which attempts to use podTemplate to accomplish this:

[SNIP]
podTemplate:
  metadata:
    labels:
       es-role: "data-search"
  spec:
    containers:
    - name: elasticsearch
      resources:
        limits:
          memory: 60G
          cpu: 7
      env:
      - name: ES_JAVA_OPTS
        value: "-Xms30g -Xmx30g"
      volumeMounts:
      - name: snapshot-claim-volume
        mountPath: /mnt/snapshots
      volumes:
      - name: snapshot-claim-volume
        persistentVolumeClaim:
          claimName: snapshot-claim
[SNIP]

When I try to deploy this using ECK, It gets stuck starting with:

create Pod fusion-search-a-es-data-searchers-0 in StatefulSet fusion-search-a-es-data-searchers failed error: Pod "fusion-search-a-es-data-searchers-0" is invalid: [spec.containers[0].volumeMounts[12].name: Not found: "snapshot-claim-volume", spec.initContainers[0].volumeMounts[12].name: Not found: "snapshot-claim-volume"]

Have I misunderstood something?

What am I doing wrong here?

Many thanks!

-Z

Hi Z,
It's hard for me to say for sure with just the snippet. The statefulset example may be helpful (as that is what ECK is using on the backend):
https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

Here's the full configuration:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: snapshot-claim
  namespace: fusion-prod
spec:
  accessModes:
  - ReadWriteOnce
  - ReadWriteMany
  resources:
    requests:
      storage: 3900Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: fusion-prod-snapshots
---
apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: fusion-search-a
  namespace: fusion-prod
spec:
  version: 7.4.0
  nodeSets:
  - name: masters
    config:
      node.master: true
      node.data: true
      node.ingest: true
      node.ml: false
    podTemplate:
      metadata:
        labels:
           es-role: "master"
      spec:
        containers:
        - name: elasticsearch
          resources:
            limits:
              memory: 60G
              cpu: 7
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms30g -Xmx30g"
          volumeMounts:
          - name: snapshot-claim-volume
            mountPath: /mnt/snapshots
          volumes:
          - name: snapshot-claim-volume
            persistentVolumeClaim:
              claimName: snapshot-claim
    count: 3
    volumeClaimTemplates:
      - metadata:
          name: elasticsearch-data
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 400G
          storageClassName: local-path

When I look at the list of Volumes in the Workload I don't see the volume I've requested in the podTemplate. But I do see 13 other volumes. (including elasticsearch-data)

I have tested the PVC and the PV, and both seem to be working correctly.

I was wondering if perhaps the operator does something special with the content of podTemplate which would prevent this from working.

Many thanks!

Ahh, I think I understand my mistake here.

You can't directly configure a pod volume in a StatefulSet. You can only configure storage through a volumeClaimTemplate. But this creates a PVC for each Pod, which won't work because I would need the Pods to share the PVC.

I'll have to rethink my strategy.

Hey Z,

What you are trying to achieve here does make sense and should work.
In your snippet above, in the podTemplate, I think the volumes section should be one level higher (under spec, but not under containers). See this example: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#claims-as-volumes. The volumeMounts section seems to be at the right place in the elasticsearch container. Your volume, volume mount and claim seems to be configured correctly otherwise.

The podTemplate section really is for the configuration of the Pods that belong to the StatefulSet, so you should be fine.

Can you give it another try with the volumes section at the right place?

I am in a similar situation were I have an on-prem Kubernetes cluster and I am trying to set up snapshots with ECK.

I've set up the configuration as the above suggestions, however when I am testing locally with one node only, the operator seems to start two pods. The operator logs reveal that it is unable to find the PVC.

I am initially trying to do this to enable me to upgrade from the 0.9 operator, to 1.0 without losing data. But I will later use this solution to enable back ups in production.
Elasticsearch config:

---
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.3.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Gi
        storageClassName: standard
    podTemplate:
      metadata:
        annotations:
          linkerd.io/inject: disabled
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms1g -Xmx1g
          resources:
            requests:
              memory: 2Gi
            limits:
              memory: 2Gi
          volumeMounts:
          - mountPath: /snapshot-data
            name: snapshot-data
        volumes:
        - name: snapshot-data
          persistentVolumeClaim:
            claimName: elasticsearch-snapshots

Persistent Volume Claim config:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: elasticsearch-snapshots
  namespace: app-namespace
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

@LegendaryAced this is intriguing. Can you post the relevant operator logs (please include some before and after so we get a bit more)?
The way we operate PersistentVolumeClaims in 0.9 has changed a lot in 1.0.0-beta1, since we now rely on StatefulSets.

@sebgl sure. Here is a snippet of the operator 0.9 logs. Operator v0.9 is the only operator running in a fresh cluster.
Unfortunately, I can't post more than this as there is a character limit at 7000.

{"level":"error","ts":1572353484.079456,"logger":"mutation","msg":"Volume is referring to unknown PVC","error":"no PVC named elasticsearch-snapshots found","stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/comparison.comparePersistentVolumeClaims\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/comparison/pvc.go:47\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/comparison.PodMatchesSpec\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/comparison/pod.go:43\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation.getAndRemoveMatchingPod\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/calculate.go:115\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation.mutableCalculateChanges\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/calculate.go:60\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation.CalculateChanges\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/mutation/calculate.go:44\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/driver.(*defaultDriver).calculateChanges\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/driver/default.go:515\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/driver/default.go:245\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/elasticsearch_controller.go:270\ngithub.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/pkg/controller/elasticsearch/elasticsearch_controller.go:215\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1572353484.0802917,"logger":"driver","msg":"Calculated all required changes","to_create:":1,"to_keep:":0,"to_delete:":2,"namespace":"app-namespace","es_name":"elasticsearch"}
{"level":"info","ts":1572353484.0804422,"logger":"driver","msg":"Calculated performable changes","schedule_for_creation_count":0,"schedule_for_deletion_count":1,"namespace":"app-namespace","es_name":"elasticsearch"}
{"level":"info","ts":1572353484.0816941,"logger":"version7","msg":"Setting voting config exclusions","excluding":["elasticsearch-es-q89szzbz7m"]}
{"level":"info","ts":1572353484.12672,"logger":"elasticsearch-controller","msg":"Updating status","iteration":354,"namespace":"app-namespace","es_name":"elasticsearch"}
{"level":"info","ts":1572353484.1267564,"logger":"generic-reconciler","msg":"Aggregated reconciliation results complete","result":{"Requeue":false,"RequeueAfter":31448891515078183}}
{"level":"info","ts":1572353484.1267705,"logger":"elasticsearch-controller","msg":"End reconcile iteration","iteration":354,"took":0.643675383,"namespace":"app-namespace","es_ame":"elasticsearch"}
{"level":"error","ts":1572353484.1267903,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"app-namespace/elasticsearch","error":"no PVC named elasticsearch-snapshots found","errorCauses":[{"error":"no PVC named elasticsearch-snapshots found"}],"stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Ah, spotted. that's a bug in the 0.9 release.
It is only able to retrieve PVCs that are labeled with:

"elasticsearch.k8s.elastic.co/cluster-name":  "elasticsearch"
"common.k8s.elastic.co/type": "elasticsearch"

Can you try labelling the PVC this way?

This should be fixed in 1.0.0-beta1 already. PVC are handled completely differently, the code block that returns the error you get doesn't even exist anymore.

@sebgl that has worked. Thanks a lot for your help. :smile:

I'm yet to achieve the upgrade without losing data. I will be back if I find anything else.

I have now achieved the above for an on-premises cluster.
I just wanted to point out that the workaround that @sebgl mentioned for v0.9 works, however the labels must be removed from the PVC before installing v1.0, otherwise the PVC gets deleted.