Hello
I have an application that is deployed via Jenkins to an EKS cluster. One of the application's services is an Elasticsearch service. After having successfully run it over a dozen times this year, the deployment has suddenly started failing, and this is without any code or configuration having changed since the previous run (2 months ago), which was successful.
In the Jenkins console, I get no errors. The job apparently runs to a successful completion.
In the Amazon OpenSearch console, the Elasticsearch domain exists, but the cluster health hasn't been assigned a colour. In the Cluster Health tab, the 'Master instance connection status health' graph is red. The Instance Health tab shows no data nodes. No log information is available. When I open the domain endpoint (https://vpc-elasticsearch-es-dev-3e7b3xxxxxxxxxxxygysdsa.eu-west-2.es.amazonaws.com), I get this response (note the na value of cluster_uuid):
{
"name" : "0bcaa110a7c92e445fb4f22f26b805e",
"cluster_name" : "51xxxxxxxx48:elasticsearch-es-dev",
"cluster_uuid" : "_na_",
"version" : {
"number" : "7.4.2",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "unknown",
"build_date" : "2021-05-21T19:55:32.869571Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
And when I ask the endpoint to list the default indexes (_cat/indices/) I get this response:
{
"message": "No server available to handle the request",
}
In the Amazon EKS console, all pods in the cluster, including the one that was responsible for creating the Elasticsearch service, has a status of 'Succeeded'.
I've tried a few fixes, to no avail:
- Checked that there's sufficient disk space available on the Jenkins and EKS EC2 instances.
- Updated the Elasticsearch image from Docker Hub, first verifying that I could successfully create an Elasticsearch service from this same image on localhost using docker-compose.
- Checked log files in the Docker and EKS EC2 instances.
- Replayed the previous (successful) build from 2 months ago.
Any insight into the possible cause of the issue would be appreciated, thanks.