Elasticsearch stuck applyingchanges and reconciliation ended with failed predicates

fozzylyon · March 31, 2021, 7:48pm

ECK 1.0.1
k8s 1.16.9

The ECK operator's been stellar. But I ran into trouble deploying node resource and count changes at the same time.

In one cluster, it seems to have worked as expected, but in another the elasticsearch object is stuck applying changes:

  NAME         HEALTH   NODES   VERSION   PHASE             AGE
  es-cluster   green    17      7.9.3     ApplyingChanges   321d

The change requested increased the node count from 17 to 23 total and changed resources on the existing 17 nodes.
New nodes were successfully added, but there was a failed_predicates in each reconciliation attempt:

  do_not_restart_healthy_node_if_MaxUnavailable_reached

And it listed all pre-existing data and master nodes as causes for failure

I was using the default Update Strategy and change budget, so it should have been able to add all new nodes immediately and terminate 1 node at a time. But it didn't attempt to terminate any existing nodes.
And after 29 reconciliation attempts over ~60 seconds, it stopped trying.

Is there a bug or known limitation in making both changes at the same time with that update strategy?
Is there a way to kick start the watcher again?

I've tried manually restarting nodes and it has no affect on the elasticsearch object

Thanks in advance!

fozzylyon · April 6, 2021, 8:21pm

We found the operator had crashed because of resources. Fixing that solved everything

Thibault_Richard · April 7, 2021, 2:56pm

Thanks for the update and sorry for not helping earlier.

Could you share more information about your use case? How many Elastic Stack components (Elasticsearch, Kibana, APM Server, Enterprise Search, and Beats) and how many nodes per component are managed by the ECK operator?

fozzylyon · April 8, 2021, 5:04pm

In this specific k8s cluster, there was only an Elasticsearch component. Initially, 3 master and 14 data nodes. No other components were deployed, but the operator OOM crashed when we added 6 more data nodes

Initial resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 250Mi

Now:
limits:
cpu: 500m
memory: 750Mi
requests:
cpu: 100m
memory: 1500Mi

system · May 6, 2021, 5:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECK Operator stuck in ApplyingChanges state after upgrade to 1.4.0 Elastic Cloud on Kubernetes (ECK)	5	2384	April 1, 2021
Elasticsearch stuck in ApplyingChanges phase on Openshift Elastic Cloud on Kubernetes (ECK)	2	1089	February 13, 2023
Elastic operator not creating elasticsearch in Openshift 3.11 Elastic Cloud on Kubernetes (ECK)	4	1213	November 18, 2021
Running eck in production best practices Elastic Cloud on Kubernetes (ECK)	3	885	February 28, 2023
Elasticsearch Resource creation failing with ECK operator Elastic Cloud on Kubernetes (ECK)	1	868	April 14, 2021

Elasticsearch stuck applyingchanges and reconciliation ended with failed predicates

Related topics