ECK 0.8.0 keeps creating and deleting nodes

Hi,

I'm using ECK 0.8.0 to manage my es cluster in k8s. It started more than half year ago and worked fine. However, recently when I tried to scale up my data nodes ( from 5 to 8). Unexpected behaviours are observed.

Here's my original setup:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: async-search
  namespace: elasticsearch
spec:
  version: 7.1.0
  nodes:
  - nodeCount: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false
#some pod template settings
# ...
    volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: local
        selector: 
          matchLabels: 
            master: true
  - nodeCount: 5
    config:
      node.master: false
      node.data: true
      node.ingest: false
# rest same as master
  - nodeCount: 2
    config:
      node.master: false
      node.data: false
      node.ingest: false
# this is the coordinator node
# settings similar to above

  updateStrategy:
    changeBudget:
      maxSurge: 0
      maxUnavailable: 1

So basically I have 3 master nodes, 5 data nodes, 2 coordinator nodes.
Here's what happened when I trying to add more nodes to scale:

  1. I tried to add 3 more coordinator nodes.
    expected: ECK add 3 more coordinator nodes to the cluster
    observed: ECK added 1 coordinator and 2 data nodes first and because I didn't provision any pv for data nodes. 2 of them keeps pending.

  2. I tried to add 3 more data nodes and 1 more coordinator nodes:
    expected: ECK add 3 more data nodes and waits for data migrating to complete and then terminates one of the old nodes. starts another one. at some point, coordinator nodes is added correctly, however the process of deleting and adding data nodes never stops. I checked opeartor log, it keeps printing

{"level":"info","ts":1585191461.1530027,"logger":"driver","msg":"Calculated all required changes","to_create:":8,"to_keep:":6,"to_delete:":8}

This behaviour kept about 24 hours I thought this is never gonna stop So I changed max unavailable to 0 and because of this, it stopped deleting old nodes. However, it still trying to create a new node.

I don't know how this happens, Can anyone help me on this one. BTW I can't upgrade to 1.0 for now because this environment is heavily used by production so it's not easy to migrate to 1.0 now.

Thanks

@huntlyroad first, I would suggest using a more recent version of ECK. We completely changed the implementation in 0.9 to rely on StatefulSets instead of creating/deleting Pods directly. Unfortunately there's no easy migration path from 0.8 (the simplest is snapshot/restore in Elasticsearch).

2 of them keeps pending

Do you still have Pending Pods running? You can try to delete them manually and see if it helps.

Also I'm not sure I get what's going on, but if this is about slowly replacing nodes, increasing maxUnavailable could help, especially if you have Pending Pods.

Also make sure you don't have a second ECK instance running somewhere that would cause conflicts here.

@sebgl What I suspect is that everytime I update my spec.yaml file, ECK needs to completely create a new cluster of es (at least for certain type of nodes or a group of nodes). So because I have updated my files 3 times, even just small configuration changes, it will try to rolling update my nodes one-by-one and until final state is met, (and it can't skip unapplied changes), am i right?

Changes are re-calculated at every reconciliation: if you change the manifest 3 times, only the last one is taken into consideration. "In-between" changes should be ignored, unless they are started already (eg. if a Pod was created by change number 2, it must be taken into consideration in the calculations required for change number 3).

@sebgl thanks for making it clear to me. I am now currently fine by setting max unavailable to 0 and not provision any pv to pending pods. We have decided to move on to v1.0 so that we can hopefully solve this problem entirely.

We have decided to move on to v1.0 so that we can hopefully solve this problem entirely.

This is great news :slight_smile:
FWIW we plan to do our best in maintaining backward-compatibility for the entire 1.x series, and we have no plans for any breaking 2.x change anytime soon.

The breaking change introduced from 0.8 (alpha) to 0.9 (beta) is quite exceptional.