Cluster suddenly blew up out of nowhere

Hi all,

Approximately 1 hour ago we started getting connection and indexing errors on our cluster 36b401. We didn't make any config changes or any other changes to our app. After a few minutes of reattempting connections from various places, we issued a force restart command through the elastic website. That restart has been running for over an hour now and only one of our two instances has come back.

Is anyone experiencing issues like this? I don't see Elastic reporting any issues and I've also emailed support. Thanks.

What is the output of the cluster health API?

Output below. Another note: we haven't seen any improvement on the unassigned shards number for at least 20-30 minutes. Might need to do an even harder restart of the cluster?

{
  "active_primary_shards": 107,
  "active_shards": 107,
  "active_shards_percent_as_number": 49.76744186046512,
  "cluster_name": "36b401<redacted>",
  "delayed_unassigned_shards": 0,
  "initializing_shards": 0,
  "number_of_data_nodes": 1,
  "number_of_in_flight_fetch": 0,
  "number_of_nodes": 2,
  "number_of_pending_tasks": 0,
  "relocating_shards": 0,
  "status": "red",
  "task_max_waiting_in_queue_millis": 0,
  "timed_out": false,
  "unassigned_shards": 108
}

Is there anything else we can do to move along this stuck restart?

I cancelled the restart to see if that helps launch the failed node.

What is the size of your cluster? Do you have monitoring enabled?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.