ECK Cluster Freezes when network fails for one node

Raz_Harush · February 1, 2021, 3:47pm

Hi,
Lastely we have been doing a couple of tests to ES clusters over K8s.
Unfortunately, one of our tests didn’t go as we anticipated

A little bit of practical information about our cluster:

The deployment of ECK is on the OpenShift platform, version 4.6.1
ECK operator version - 1.2.0
Elasticsearch version - 7.6.2
Three bare-metal OpenShift severs on which the pods are deployed
The configuration in which we work is based off of Availability Zone Awareness, as presented in the documentation here: Advanced Elasticsearch node scheduling | Elastic Cloud on Kubernetes [1.3] | Elastic
In our deployment, we chose to distribute the pods into two zones

Basically, our tests were designated to check High-Availability.
This specific test simulates a fall of the network in one server in one zone and goes as follows:
Anticipation - the cluster should be available for data ingestion and searches.

In practice, the opposite had happened
The cluster didn’t respond for client requests and we were having a latency in both searches and data ingestion between 20 seconds up to 70 seconds
We came up with another vector - perhaps the cluster wouldn’t respond externally, buy internally it would still be working
That didn’t go as well - there was a check of a local curl request to the api _cat/nodes in one of the master nodes CLI belonging the pods and yet, there was no response until around 40 seconds later

We are out of ideas right now
If anyone here is using ECK in production environment and know what’s going on, we would love to hear about it

warkolm · February 1, 2021, 11:08pm

Welcome to our community!

Can you provide the config you used to setup the cluster?

Topic		Replies	Views
Elasticsearch instability using ECK operator on openshift cluster Elastic Cloud on Kubernetes (ECK)	1	257	December 30, 2022
Running eck in production best practices Elastic Cloud on Kubernetes (ECK)	3	913	February 28, 2023
ECK Master not discovered issue Elastic Cloud on Kubernetes (ECK)	2	318	June 21, 2022
ECK pod re-scheduling explanation Elastic Cloud on Kubernetes (ECK)	2	397	November 4, 2022
ECK clusters and client sniffing Elastic Cloud on Kubernetes (ECK)	3	791	November 4, 2022

ECK Cluster Freezes when network fails for one node

Related topics