Error on cluster downsizing if the master node is stopped

gboanea · August 19, 2019, 3:29pm

Hi,

I am using the quickstart steps from: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html.

I am using EKS 1.13.

If I upgrade to 3 nodes and then go back to 2 nodes, if the node that is shut down the master node is, then I can see the following error:

{"type": "server", "timestamp": "2019-08-19T14:24:40,552+0000", "level": "ERROR", "component": "o.e.x.s.a.TokenService", "cluster.name": "quickstart", "node.name": "quickstart-es-bsdgq9mjrx", "cluster.uuid": "IpkatT_qTaqok3H0pIiqRQ", "node.id": "EppdBoAFT2ajg5hrLwba5g", "message": "unable to install token metadata" ,
"stacktrace": ["org.elasticsearch.cluster.NotMasterException: no longer master. source: [install-token-metadata]"] }

In the tab where 'kubectl port-forward service/quickstart-es-http 9200' is started:

E0819 17:15:44.177539 84646 portforward.go:362] error creating forwarding stream for port 9200 -> 9200: Timeout occured
Handling connection for 9200

The curl commands end with:
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:9200

The problem occurs only if the master node is shut down at resizing.

If I restart 'kubectl port-forward service/quickstart-es-http 9200' everything works again.

Is this a problem on my side?

Thank you,
Georgeta

sebgl · September 12, 2019, 12:51pm

Hi Georgeta,

This is indeed kubectl port-forward behaviour: it binds to a pod, if that pods goes down, you need to restart the port-forward command.

Does your cluster go back to a healthy state after a few seconds? Killing the master node on Elasticsearch v6.X may lead to longer unavailability, compared to Elasticsearch 7.X.