Pods not finding PVC when using storage class with Retain reclaim policy

I am using in an Elasticsearch instance, a StorageClass that has Retain (instead of Delete) as its reclaim policy.

Here are my PVCs before deleting the Elasticsearch instance

▶ k get pvc
NAME                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
elasticsearch-data--es-multirolenodes1-0   Bound    pvc-ba157213-67cf-4b81-8fe2-6211b771e62c   20Gi       RWO               balanced-retain-csi   8m15s
elasticsearch-data--es-multirolenodes1-1   Bound    pvc-e77dbb00-7cad-419f-953e-f3398e3860f4   20Gi       RWO               balanced-retain-csi   7m11s
elasticsearch-data--es-multirolenodes1-2   Bound    pvc-b258821b-0d93-4ea3-8bf1-db590b93adfd   20Gi       RWO               balanced-retain-csi   6m5s

I deleted and re-installed the helm chart with the hope that due to the Retain policy, the new pods (i.e. their PVCs would bind to the existing PVs (and data wouldn't get lost)

However now my pods of the nodeSet are all in pending state with this error

Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Warning  FailedScheduling   2m37s                default-scheduler   persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" is being deleted
  Normal   NotTriggerScaleUp  2m32s                cluster-autoscaler  pod didn't trigger scale-up: 2 persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" not found
  Warning  FailedScheduling   12s (x7 over 2m37s)  default-scheduler   persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" not found

Why is this happening?

Is there a way to save the data when the Elasticsearch resource is accidentally deleted (and recreated with the exact same configuration) ?

edit: Here are the corresponding PVs

▶ k get pv 
pvc-b258821b-0d93-4ea3-8bf1-db590b93adfd   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-2          balanced-retain-csi            20m
pvc-ba157213-67cf-4b81-8fe2-6211b771e62c   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-0          balanced-retain-csi            22m
pvc-e77dbb00-7cad-419f-953e-f3398e3860f4   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-1          balanced-retain-csi            21m

There is of course no PVC now

▶ k get pvc                       
No resources found in elastic namespace.

The StorageClass under consideration is using the csi driver for the GCP persistent disk, fwiw

A PV can't be used until it is in an Available state, according to the K8S documentation:

When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.

You must delete/cleanup the claimRef to make them available again.

The primary way to backup Elasticsearch clusters are snapshots. Regarding Kubernetes/Custom resources we do not have recommendations at the moment, it might be tied to the way you backup other resources or your K8S control plane (with a CD pipeline using git, using etcd snapshots...), it's hard to give specific recommendations.

Since ECK 1.5.0 you can also decouple the lifecycle of the cluster from the lifecycle of the PVCs.

We also provide a tool to reattach pv but it must be used only in last resort.

Thanks for this elaborate answer.

It is just weird that the same process, i.e. performing a helm install and then helm delete when using the official helm charts, seems to keep the PVC and I am trying to find out why this difference in the behaviour.

No, it is ECK that I am using.

I just noticed that

  • performing helm delete (when using the helm charts) retains the PVCs
  • deleting the Elasticsearch resource (when using ECK) removes the PVCs

In any case, setting volumeClaimDeletePolicy: DeleteOnScaledownOnly seems to do the job since I deleted and re-created (with the exact same configuration) the Elasticsearch resource and there was an actual data retention (the PVC were not deleted and the indices were there when the new pods came up).

ΒΤW it is not very obvious / intuitive what volumeClaimDeletePolicy: DeleteOnScaledownOnly means.

In which case / scenario the PVCs will be actually deleted? (they were not deleted when deleting the Elasticsearch resource anyway.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.