Elasticsearch upgrade stuck - Skipping deletion because of migrating

Hello,

I've modified request, limits and xms xmx Java options of my cluster yaml.

The upgrade is stuck and operators logs are

{"level":"info","ts":1577967197.4311237,"logger":"elasticsearch-controller","msg":"Updating status","iteration":5457,"namespace":"elasticsearch","es_name":"elasticsearch"}

{"level":"info","ts":1577967197.4311762,"logger":"generic-reconciler","msg":"Aggregated reconciliation results complete","result":{"Requeue":true,"RequeueAfter":10000000000}}

{"level":"info","ts":1577967197.4311981,"logger":"elasticsearch-controller","msg":"End reconcile iteration","iteration":5457,"took":0.784774123,"namespace":"elasticsearch","es_ame":"elasticsearch"}

{"level":"info","ts":1577967207.4313903,"logger":"elasticsearch-controller","msg":"Start reconcile iteration","iteration":5458,"namespace":"elasticsearch","es_name":"elasticsearch"}

{"level":"info","ts":1577967207.4348965,"logger":"transport","msg":"Reconciling transport certificate secrets","namespace":"elasticsearch","es_name":"elasticsearch"}

{"level":"info","ts":1577967208.1634543,"logger":"driver","msg":"Calculated all required changes","to_create:":5,"to_keep:":4,"to_delete:":6,"namespace":"elasticsearch","es_name":"elasticsearch"}

{"level":"info","ts":1577967208.1636055,"logger":"driver","msg":"Calculated performable changes","schedule_for_creation_count":0,"schedule_for_deletion_count":1,"namespace":"elasticsearch","es_name":"elasticsearch"}

{"level":"info","ts":1577967208.2052684,"logger":"driver","msg":"Skipping deletion because of migrating data","namespace":"elasticsearch","es_name":"elasticsearch","pod_name":"elasticsearch-es-data-sx54vph58q"}

{"level":"info","ts":1577967208.2053738,"logger":"elasticsearch-controller","msg":"Updating status","iteration":5458,"namespace":"elasticsearch","es_name":"elasticsearch"}

I don't know what means "Skipping deletion because of migrating"

Cluster previous state was yellow, because of one data node was "evicted" because of request threshold

Operators image is:

docker.elastic.co/eck/eck-operator:0.9.0

That message is because we need to restart a pod to apply the new settings, but we cannot restart it because data is currently being migrated. This behavior is much improved in the beta version of the operator and I would definitely recommend upgrading.

I will upgrade operator when the cluster upgrade process finish. But what can I do now?

Cluster is defined with 3 data nodes, but only two are running because one of them got the "evicted" signal because go over the request. Upgrade is stuck because i need one more data node for mantain the data replication?

I also saw that I have two indices for kibana. Could be because of that?
green open .kibana_1 6MRg6ml7RJCRLBlb3dRw4Q 1 1 7 0 169kb 84.5kb
green open .kibana_2 iqNOu78jQYCf8k4prtHCMg 1 1 41 8 192.5kb 103.7kb
green open .kibana_task_manager_1 9AuHqBHoRPCdsfArt-zO_w 1 1 2 0 13.8kb 6.9kb
green open .kibana_task_manager_2

Thanks

Are you using PersistentVolumes? How many Pods are currently running in your cluster?

The outputs of the following commands (stripped out of any sensitive information) would be useful to understand what's going on:

  • kubectl get elasticsearch -o json
  • kubectl get pods
  • GET /_cat/shards (Elasticsearch request)

It seems that update process it's finished. Probably only time was needed...

Thanks for support and sorry for disturb.

PD: What is the final log of the update in operator?