Elasticsearch service deployed to EKS by Jenkins suddenly fails health check

alanwgmendel · October 27, 2021, 7:39am

Hello

I have an application that is deployed via Jenkins to an EKS cluster. One of the application's services is an Elasticsearch service. After having successfully run it over a dozen times this year, the deployment has suddenly started failing, and this is without any code or configuration having changed since the previous run (2 months ago), which was successful.

In the Jenkins console, I get no errors. The job apparently runs to a successful completion.

In the Amazon OpenSearch console, the Elasticsearch domain exists, but the cluster health hasn't been assigned a colour. In the Cluster Health tab, the 'Master instance connection status health' graph is red. The Instance Health tab shows no data nodes. No log information is available. When I open the domain endpoint (https://vpc-elasticsearch-es-dev-3e7b3xxxxxxxxxxxygysdsa.eu-west-2.es.amazonaws.com), I get this response (note the na value of cluster_uuid):

{
  "name" : "0bcaa110a7c92e445fb4f22f26b805e",
  "cluster_name" : "51xxxxxxxx48:elasticsearch-es-dev",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "7.4.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "unknown",
    "build_date" : "2021-05-21T19:55:32.869571Z",
    "build_snapshot" : false,
    "lucene_version" : "8.2.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And when I ask the endpoint to list the default indexes (_cat/indices/) I get this response:

{
  "message": "No server available to handle the request",
}

In the Amazon EKS console, all pods in the cluster, including the one that was responsible for creating the Elasticsearch service, has a status of 'Succeeded'.

I've tried a few fixes, to no avail:

Checked that there's sufficient disk space available on the Jenkins and EKS EC2 instances.
Updated the Elasticsearch image from Docker Hub, first verifying that I could successfully create an Elasticsearch service from this same image on localhost using docker-compose.
Checked log files in the Docker and EKS EC2 instances.
Replayed the previous (successful) build from 2 months ago.

Any insight into the possible cause of the issue would be appreciated, thanks.

warkolm · October 27, 2021, 7:43am

Welcome to our community!

Unfortunately you will need to ask aws about this as it sounds like there is something amiss with the deployment.

That's definitely not an Elasticsearch response, which is why we cannot help.

Alternatively, please consider upgrading to use the original Elasticsearch that we make, which is available on the aws marketplace - AWS Marketplace: Elastic Cloud (Elasticsearch Service)

system · November 24, 2021, 7:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.