Elasticsearch Query Latency Issue During StatefulSet Rollout Restarts on ECK

Hello,

I am currently operating an Elasticsearch cluster using ECK on Kubernetes. We handle more than 1000 TPS requests to our Elasticsearch cluster. The issue we are facing is that when we apply changes like version upgrades or configuration updates to the ECK data nodes, the StatefulSet performing a rollout restart causes the shards to go into an initializing state. Even after the nodes come back up, they start without any cache, causing the query latency, which was previously within 100ms, to spike to up to 10s for the p99.

Could the Elasticsearch team please recommend a way to mitigate this issue?

Thank you.

From Elasticsearch to Elastic Cloud on Kubernetes (ECK)