Node not discovering master node properly, "Cluster state has not been recovered yet, cannot write to the [null] index" error [503]

Hello, I have 2 servers in 2 different locations I am trying to make into a cluster. Both are running Elasticsearch v8.1.3.

I have the master server, which is configured like this:

cluster.name: yyz-news-prod
node.name: node-yyz-1
cluster.initial_master_nodes: ["node-yyz-1"]
discovery.seed_hosts:
   - xx.xx.xxx.x
   - yy.yyy.yyy.yy

xpack.security.enrollment.enabled: true
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
http.host: [_local_, _site_]

And then the second server:

cluster.name: yyz-news-prod
node.name: node-yyz-2
cluster.initial_master_nodes: ["node-yyz-1"]
discovery.seed_hosts:
   - xx.xx.xxx.x
   - yy.yyy.yyy.yy

xpack.security.enrollment.enabled: true
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12
http.host: [_local_, _site_]

I started with a fresh install for both servers, launched the master server first, then launched the second slave server. However when I curl --insecure https://localhost:9200/_cluster/health?pretty on my master server I see only 1 nodes connected:

"cluster_name" : "yyz-news-prod",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 2,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

When I do the same on the slave server with the autogenerated slave password for the elastic user I get the following 503 error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "status_exception",
        "reason" : "Cluster state has not been recovered yet, cannot write to the [null] index"
      }
    ],
    "type" : "authentication_processing_error",
    "reason" : "failed to promote the auto-configured elastic password hash",
    "caused_by" : {
      "type" : "status_exception",
      "reason" : "Cluster state has not been recovered yet, cannot write to the [null] index"
    }
  },
  "status" : 503
}

I don't know why my slave node cannot connect to and find my master node? They are both portforwarded on 9200 and 9300 so I don't think it's a networking issue. Any suggestions?

Can you take a look at the logs on the server side, they should be more helpful, than that error message returned to the client.

Thanks!

1 Like

Hello, thank you for the response and sorry for the late reply.

The output of the /var/log/elasticsearch/my-cluster-name.log seems to be repeating the following line until the client disconnects from the master server:

[2022-05-02T01:32:11,627][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [node-yyz-1] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.0.0.207:9300, remoteAddress=/yy.yyy.yyy.yy:58074, profile=default}

I did not configure the certificates of either node, I just installed them via the apt package manager and they auto configured the security settings.

See: Reconfigure a node to join an existing cluster