GKE Upgrade and PDB

Hey Guys,

I am install my ECK on GKE
as you may know GKE provide a method for cluster auto upgrade or no matter as a normal upgrade for node pool (it does 1 by 1 in each zone)

if we setup PDB , it will follow the rules as max 1 hour

However i dont know if 1 hour is enough for the replica relocate to another know

What will happen to the case and do ECK correspond react for this ?

https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster

Hi Vincent,

ECK already sets up a PDB with a maximum of one Pod allowed to be taken down (see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-pod-disruption-budget.html). I think that should be good enough for most cases where you don't run multiple Elasticsearch Pods per Kubernetes node.

Hi @Vincent_Ngai, thanks for you message.

As to the replica relocation. If you are using network attached storage the replica relocation wouldn't normally be needed - as pod disappears and gets recreated on another node, its PV follows it and gets reattached.

If you are using local storage and it gets lost during node upgrade, after new pod is up and running ES will replicate the missing state. As to how long it can take, it depends on the state size, network performance and cluster load.

Currently, we don't have any specific recommendations around this, but I've created https://github.com/elastic/cloud-on-k8s/issues/2448 to track it.

Yes I know it support us to set PDB
But as I said the gke cluster will not wait forever

Let say my cluster have 3 node( each physica node with 1 eck node) and I have 1 replica for my index and my PDB only allow 1 unavailable

If the gke got upgrade
It will then kill 1 of node
And ECK will start reassign the shard to another node (i suppose it will? )

In the time GKE will wait becox of PDB
However it will only wait 1 hour

Even dont know If the migration of the shard not complete
If over 1 hr, then
Is that means my data will have high chance lost?

I am using PD (persistent disk) in GKE
Which is zonal disk
Pod gone will fine be, reborn pod can monut back the disk
However I am not sure when will ECK do the migration

In this case, your cluster should be fine.

If you are using persistent disks the pod will be deleted and recreated on a different k8s node. As soon as this happens, PV on your persistent disk will be attached to the pod and ES will continue operating as normal. There is no migration to be done as data on that PD was not lost. If pod coming up takes some time, a data migration to a different pod might start, but as soon as new pod (with already existing PV) rejoins ES cluster, the migration will be cancelled.

1 Like

Thx man really address my concern

Happy to help!