Hi ,
While upgrading Elasticsearch & kibana from version ELK-7.0.1 to ELK-7.8.0 , Kibana pod doesn't come up , and shows the below message in the Kibana log.
we are using helm way to upgrade .
{"type":"log","@timestamp":"2020-10-12T09:18:47Z","tags":["info","savedobjects-service"],"pid":10,"message":"Starting saved objects migrations"}
{"type":"log","@timestamp":"2020-10-12T09:18:48Z","tags":["info","savedobjects-service"],"pid":10,"message":"Creating index .kibana_2."}
{"type":"log","@timestamp":"2020-10-12T09:18:48Z","tags":["warning","savedobjects-service"],"pid":10,"message":"Unable to connect to Elasticsearch. Error: [resource_already_exists_exception] index [.kibana_2/UIPMqE4jTP2b6luSANHY2A] already exists, with
{ index_uuid=\"UIPMqE4jTP2b6luSANHY2A\" & index=\".kibana_2\" }
"}
{"type":"log","@timestamp":"2020-10-12T09:18:48Z","tags":["warning","savedobjects-service"],"pid":10,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana."}
Where , all Elasticsearch pods are up after upgrade was done. Only kibana is failed to come up.
As described in the log message, if you have only a single kibana instance attempting the migration, you can delete the .kibana2 index and restart kibana
We would like to know under what circumstance this issue happens ?
Since the helm upgrade process is automated in Jenkin pipeline in our case, what action can be taken to avoid this ? Or any solution available for this error apart from manually deleting the .kibana_2 index ?
The main cause of this issue in a k8s environment is a race condition where multiple Kibana pods are attempting to migrate the index at the same time. In that case, the easiest solution is to set the deployment / replica set number of replica to 1 when upgrading, then setting it back to its initial value once the first pod completes the migration.
If you only have one Kibana pod in the deployment, the cause is less obvious, but it could be caused by the pod being killed or restarted while it was performing the migration.
For the solutions, once the cluster is in that state, manually deleting the index is the only possible options for now. However, we are currently actively working on a new migration workflow that should greatly reduce the risks of that kind of upgrade issues.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.