Volume claim templates cannot be modified

Hi there,

I'm trying to upgrade my ECK cluster by bumping up the apiVersion from elasticsearch.k8s.elastic.co/v1beta1 to elasticsearch.k8s.elastic.co/v1 and bumping the version from 7.5.1 to 7.6.0.

After applying the configuration the logs display:

{"level":"error","@timestamp":"2020-02-15T20:37:22.044Z","logger":"controller-runtime.controller","message":"Reconciler error","ver":"1.0.1-bcb74688","controller":"elasticsearch-controller","request":"default/quickstart","error":"admission webhook \"elastic-es-validation-v1.k8s.elastic.co\" denied the request: Elasticsearch.elasticsearch.k8s.elastic.co \"quickstart\" is invalid: spec.nodeSet[2].volumeClaimTemplates: Invalid value: []v1.PersistentVolumeClaim{v1.PersistentVolumeClaim{TypeMeta:v1.TypeMeta{Kind:\"\", APIVersion:\"\"}, ObjectMeta:v1.ObjectMeta{Name:\"elasticsearch-data\", GenerateName:\"\", Namespace:\"\", SelfLink:\"\", UID:\"\", ResourceVersion:\"\", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:\"\", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.PersistentVolumeClaimSpec{AccessModes:[]v1.PersistentVolumeAccessMode{\"ReadWriteOnce\"}, Selector:(*v1.LabelSelector)(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList{\"storage\":resource.Quantity{i:resource.int64Amount{value:3298534883328, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:\"3Ti\", Format:\"BinarySI\"}}}, VolumeName:\"\", StorageClassName:(*string)(0xc001120c50), VolumeMode:(*v1.PersistentVolumeMode)(nil), DataSource:(*v1.TypedLocalObjectReference)(nil)}, Status:v1.PersistentVolumeClaimStatus{Phase:\"\", AccessModes:[]v1.PersistentVolumeAccessMode(nil), Capacity:v1.ResourceList(nil), Conditions:[]v1.PersistentVolumeClaimCondition(nil)}}}: Volume claim templates cannot be modified","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:88"}

This suggests that at some point the volume claim template was modified but is there anything in this that tells me which volume claim template it is and what the value should actually be? Will changing it back fix this?

Thanks!

Hi @NickL2,

According to the logs above, that should be the 3rd NodeSet in your Elasticsearch manifest: spec.nodeSet[2].volumeClaimTemplates .

Could you try to retrieve the corresponding StatefulSet (its name should be composed of the cluster name + the nodeSet name) from Kubernetes? Then inspect its volumeClaimTemplates section and look at how it differs from the one in your Elasticsearch manifest.

If you are not sure please feel free to copy your Elasticsearch yaml manifest here and the StatefulSet yaml manifest as well (kubectl get statefulset quickstart-<nodeSet-name> -o yaml).

Thanks for the response @sebgl! I wound up resolving the issue by removing that set from my kubectl file and then adding it back in. It was a bit of a pain, but it worked.

Hi @sebgl, I ran into the same problem and I think this might be a bug.
This happens when you write something like 1024Gi in volumn claim template and this is automatically translated into 1Ti in K8S pvc spec. And when ECK trying to do any changes, it checks the new volume claim template and it thinks that 1024Gi is not equal to 1Ti, so it claims " [volume claim templates cannot be modified]."

The awkward thing is that when I tried to updated my spec.yaml file, modify 1024Gi to 1Ti, ECK won't allow me to do that, And when I tried to modify anything else, operator logs the above error and nothing will be applied.

@huntlyroad interesting, can you paste the contents of your Elasticsearch yaml manifest, and the content of the existing StatefulSets yaml manifests?

Hi @sebgl Sorry I can't paste the entire manifest file just because I'm not sure if there's any sensitive info there. I can provide the volumneClaimTemplate part, hope that helps.

here's the volumeClaimTemplate in my manifest

    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1024Gi
        storageClassName: local-ssd
        selector: 
          matchLabels: 
            labels: 1

Here's the PVC block in the get statefulset -o yaml file

  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: elasticsearch-data
      ownerReferences:
      - apiVersion: elasticsearch.k8s.elastic.co/v1
        blockOwnerDeletion: false
        controller: true
        kind: Elasticsearch
        name: name
        uid: 3d697090-f7b7-41e6-a66a-726a9693f7fa
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Ti
      selector:
        matchLabels:
          labels: 1
      storageClassName: local-ssd
      volumeMode: Filesystem
    status:
      phase: Pending

Thanks @huntlyroad.
I managed to reproduce locally, I think this is a bug in our webhook that should compare storage size without depending on its internal representation.

I created an issue in our Github repository: https://github.com/elastic/cloud-on-k8s/issues/2856. Thanks for reporting this bug!

As a workaround until that's fixed you can disable the validation webhook:

kubectl delete validatingwebhookconfiguration elastic-webhook.k8s.elastic.co

If later on you want to re-enable the webhook (validation can be useful for other purposes), you can simply reapply ECK installation yaml manifests.

For local-ssd, you may want to follow the step here


to first manually create PV and then PVC can bind.

As a work around you can provide '1Ti' or '2Ti' in storage request. With ECK 1.1 version it works:

volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
        labels:
          app: elastic
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Ti

It seems, its already picked up by Elastic team and will be solve in future release - https://github.com/elastic/cloud-on-k8s/pull/2857.