ECK 0.8.0 keeps creating and deleting nodes

Hi,

I'm using ECK 0.8.0 to manage my es cluster in k8s. It started more than half year ago and worked fine. However, recently when I tried to scale up my data nodes ( from 5 to 8). Unexpected behaviours are observed.

Here's my original setup:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: async-search
  namespace: elasticsearch
spec:
  version: 7.1.0
  nodes:
  - nodeCount: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false
#some pod template settings
# ...
    volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: local
        selector: 
          matchLabels: 
            master: true
  - nodeCount: 5
    config:
      node.master: false
      node.data: true
      node.ingest: false
# rest same as master
  - nodeCount: 2
    config:
      node.master: false
      node.data: false
      node.ingest: false
# this is the coordinator node
# settings similar to above

  updateStrategy:
    changeBudget:
      maxSurge: 0
      maxUnavailable: 1

So basically I have 3 master nodes, 5 data nodes, 2 coordinator nodes.
Here's what happened when I trying to add more nodes to scale:

  1. I tried to add 3 more coordinator nodes.
    expected: ECK add 3 more coordinator nodes to the cluster
    observed: ECK added 1 coordinator and 2 data nodes first and because I didn't provision any pv for data nodes. 2 of them keeps pending.

  2. I tried to add 3 more data nodes and 1 more coordinator nodes:
    expected: ECK add 3 more data nodes and waits for data migrating to complete and then terminates one of the old nodes. starts another one. at some point, coordinator nodes is added correctly, however the process of deleting and adding data nodes never stops. I checked opeartor log, it keeps printing

{"level":"info","ts":1585191461.1530027,"logger":"driver","msg":"Calculated all required changes","to_create:":8,"to_keep:":6,"to_delete:":8}

This behaviour kept about 24 hours I thought this is never gonna stop So I changed max unavailable to 0 and because of this, it stopped deleting old nodes. However, it still trying to create a new node.

I don't know how this happens, Can anyone help me on this one. BTW I can't upgrade to 1.0 for now because this environment is heavily used by production so it's not easy to migrate to 1.0 now.

Thanks