Pods not finding PVC when using storage class with Retain reclaim policy

pkaramol · June 14, 2021, 1:08pm

I am using in an Elasticsearch instance, a StorageClass that has Retain (instead of Delete) as its reclaim policy.

Here are my PVCs before deleting the Elasticsearch instance

▶ k get pvc
NAME                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
elasticsearch-data--es-multirolenodes1-0   Bound    pvc-ba157213-67cf-4b81-8fe2-6211b771e62c   20Gi       RWO               balanced-retain-csi   8m15s
elasticsearch-data--es-multirolenodes1-1   Bound    pvc-e77dbb00-7cad-419f-953e-f3398e3860f4   20Gi       RWO               balanced-retain-csi   7m11s
elasticsearch-data--es-multirolenodes1-2   Bound    pvc-b258821b-0d93-4ea3-8bf1-db590b93adfd   20Gi       RWO               balanced-retain-csi   6m5s

I deleted and re-installed the helm chart with the hope that due to the Retain policy, the new pods (i.e. their PVCs would bind to the existing PVs (and data wouldn't get lost)

However now my pods of the nodeSet are all in pending state with this error

Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Warning  FailedScheduling   2m37s                default-scheduler   persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" is being deleted
  Normal   NotTriggerScaleUp  2m32s                cluster-autoscaler  pod didn't trigger scale-up: 2 persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" not found
  Warning  FailedScheduling   12s (x7 over 2m37s)  default-scheduler   persistentvolumeclaim "elasticsearch-data--es-multirolenodes1-0" not found

Why is this happening?

Is there a way to save the data when the Elasticsearch resource is accidentally deleted (and recreated with the exact same configuration) ?

edit: Here are the corresponding PVs

▶ k get pv 
pvc-b258821b-0d93-4ea3-8bf1-db590b93adfd   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-2          balanced-retain-csi            20m
pvc-ba157213-67cf-4b81-8fe2-6211b771e62c   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-0          balanced-retain-csi            22m
pvc-e77dbb00-7cad-419f-953e-f3398e3860f4   20Gi       RWO            Retain           Released   elastic/elasticsearch-data--es-multirolenodes1-1          balanced-retain-csi            21m

There is of course no PVC now

▶ k get pvc                       
No resources found in elastic namespace.

The StorageClass under consideration is using the csi driver for the GCP persistent disk, fwiw

michael.morello · June 15, 2021, 6:15am

A PV can't be used until it is in an Available state, according to the K8S documentation:

When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.

You must delete/cleanup the claimRef to make them available again.

The primary way to backup Elasticsearch clusters are snapshots. Regarding Kubernetes/Custom resources we do not have recommendations at the moment, it might be tied to the way you backup other resources or your K8S control plane (with a CD pipeline using git, using etcd snapshots...), it's hard to give specific recommendations.

Since ECK 1.5.0 you can also decouple the lifecycle of the cluster from the lifecycle of the PVCs.

We also provide a tool to reattach pv but it must be used only in last resort.

pkaramol · June 15, 2021, 10:03am

Thanks for this elaborate answer.

It is just weird that the same process, i.e. performing a helm install and then helm delete when using the official helm charts, seems to keep the PVC and I am trying to find out why this difference in the behaviour.

pkaramol · June 15, 2021, 10:49am

No, it is ECK that I am using.

I just noticed that

performing helm delete (when using the helm charts) retains the PVCs
deleting the Elasticsearch resource (when using ECK) removes the PVCs

In any case, setting volumeClaimDeletePolicy: DeleteOnScaledownOnly seems to do the job since I deleted and re-created (with the exact same configuration) the Elasticsearch resource and there was an actual data retention (the PVC were not deleted and the indices were there when the new pods came up).

pkaramol · June 15, 2021, 10:51am

ΒΤW it is not very obvious / intuitive what volumeClaimDeletePolicy: DeleteOnScaledownOnly means.

In which case / scenario the PVCs will be actually deleted? (they were not deleted when deleting the Elasticsearch resource anyway.

system · July 13, 2021, 10:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECK Stack - no persistent volumes available for this claim and no storage class is set Elasticsearch	1	422	August 4, 2023
Issue with PVC in ECK Elastic Cloud on Kubernetes (ECK)	3	711	May 22, 2022
Stuck pending elasticdeployment pods Elastic Cloud on Kubernetes (ECK)	6	4378	November 4, 2022
Pvc s should not deleted while deleting a cluster Elastic Cloud on Kubernetes (ECK)	7	1312	December 8, 2020
ECK Storage Recommendations Elastic Cloud on Kubernetes (ECK)	6	1745	November 4, 2022

Pods not finding PVC when using storage class with Retain reclaim policy

Related topics