Recovering from a crash

Hi

Not sure if this is more of an ES or Kibana topic but I will start with the ES part. I am running a single node cluster for development projects (official docker containers on Podman). After consecutive power failures my cluster went to a red state. The status of the shards is the following:

.geoip_databases                                              0 p STARTED
.kibana-event-log-8.2.3-000001                                0 p STARTED
.kibana_8.2.3_001                                             0 p STARTED
.security-7                                                   0 p STARTED
.ds-ilm-history-5-2022.08.06-000002                           0 p STARTED
.ds-filebeat-8.2.3-2022.08.06-000002                          0 p STARTED
.ds-filebeat-8.2.3-2022.08.06-000002                          0 r UNASSIGNED CLUSTER_RECOVERED
.ds-.logs-deprecation.elasticsearch-default-2022.07.07-000001 0 p STARTED
.async-search                                                 0 p STARTED
.apm-custom-link                                              0 p STARTED
.metrics-endpoint.metadata_united_default                     0 p STARTED
.items-default-000001                                         0 p STARTED
.items-default-000001                                         0 r UNASSIGNED CLUSTER_RECOVERED
.ds-.logs-deprecation.elasticsearch-default-2022.08.06-000002 0 p STARTED
.tasks                                                        0 p STARTED
.kibana_security_session_1                                    0 p STARTED
.transform-internal-007                                       0 p STARTED
.apm-agent-configuration                                      0 p STARTED
.transform-notifications-000002                               0 p STARTED
metrics-endpoint.metadata_current_default                     0 p STARTED
.ds-ilm-history-5-2022.07.07-000001                           0 p STARTED
.kibana-event-log-8.2.3-000002                                0 p STARTED
.ds-filebeat-8.2.3-2022.07.07-000001                          0 p STARTED
.ds-filebeat-8.2.3-2022.07.07-000001                          0 r UNASSIGNED CLUSTER_RECOVERED
.kibana_task_manager_8.2.3_001                                0 p UNASSIGNED CLUSTER_RECOVERED
.lists-default-000001                                         0 p STARTED
.lists-default-000001                                         0 r UNASSIGNED CLUSTER_RECOVERED

Internal health API reports this:

{
  "status": "red",
  "cluster_name": "duckasylum-dev-cluster",
  "components": {
    "cluster_coordination": {
      "status": "green",
      "indicators": {
        "instance_has_master": {
          "status": "green",
          "summary": "Health coordinating instance has a master node.",
          "details": {
            "coordinating_node": {
              "node_id": "5QbX3zL5RpSj7sBaNq5uAg",
              "name": "es-node01"
            },
            "master_node": {
              "node_id": "5QbX3zL5RpSj7sBaNq5uAg",
              "name": "es-node01"
            }
          }
        }
      }
    },
    "data": {
      "status": "red",
      "indicators": {
        "shards_availability": {
          "status": "red",
          "summary": "This cluster has 1 unavailable primary, 4 unavailable replicas.",
          "details": {
            "unassigned_replicas": 4,
            "restarting_primaries": 0,
            "restarting_replicas": 0,
            "initializing_primaries": 0,
            "started_replicas": 0,
            "initializing_replicas": 0,
            "unassigned_primaries": 1,
            "started_primaries": 22,
            "creating_primaries": 0
          },
          "impacts": [
            {
              "severity": 1,
              "description": "Cannot add data to 1 index [.kibana_task_manager_8.2.3_001]. Searches might return incomplete results."
            },
            {
              "severity": 3,
              "description": "Searches might return slower than usual. Fewer redundant copies of the data exist on 4 indices [.ds-filebeat-8.2.3-2022.08.06-000002, .ds-filebeat-8.2.3-2022.07.07-000001, .lists-default-000001, .items-default-000001]."
            }
          ]
        },
        "ilm": {
          "status": "green",
          "summary": "ILM is running",
          "details": {
            "ilm_status": "RUNNING",
            "policies": 23
          }
        }
      }
    },
    "snapshot": {
      "status": "green",
      "indicators": {
        "repository_integrity": {
          "status": "green",
          "summary": "No repositories configured.",
          "details": {}
        },
        "slm": {
          "status": "green",
          "summary": "No policies configured",
          "details": {
            "slm_status": "RUNNING",
            "policies": 0
          }
        }
      }
    }
  }
}

My overall question would be if any of this is recoverable?
Is it possible to recover the 4 replica shards from the primary shards?
If not, is it safe to delete the .ds-filebeat-8.2.3-2022.08.06-000002, .ds-filebeat-8.2.3-2022.07.07-000001, .lists-default-000001, .items-default-000001 indices?
What can I do with the unavailable primary shard? Additional info I have is that:

{
  "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index": ".kibana_task_manager_8.2.3_001",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "CLUSTER_RECOVERED",
    "at": "2022-08-22T16:19:19.021Z",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "Elasticsearch can't allocate this shard because all the copies of its data in the cluster are stale or corrupt. Elasticsearch will allocate this shard when a node containing a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot.",
  "node_allocation_decisions": [
    {
      "node_id": "5QbX3zL5RpSj7sBaNq5uAg",
      "node_name": "es-node01",
      "transport_address": "172.27.224.10:9300",
      "node_attributes": {
        "ml.machine_memory": "1073741824",
        "xpack.installed": "true",
        "ml.max_jvm_size": "536870912"
      },
      "node_decision": "no",
      "store": {
        "in_sync": true,
        "allocation_id": "KMYiLhP-RW2bjoMXr54nRw",
        "store_exception": {
          "type": "corrupt_index_exception",
          "reason": "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
          "caused_by": {
            "type": "i_o_exception",
            "reason": "failed engine (reason: [merge failed])",
            "caused_by": {
              "type": "corrupt_index_exception",
              "reason": "checksum failed (hardware problem?) : expected=aecd6623 actual=65de26a (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/indices/eq-1_2VZQtaDg61SLQ42XA/0/index/_h962.kdd\")))"
            }
          }
        }
      }
    }
  ]
}

If it is not recoverable, can I delete it and somehow instruct Kibana to recreate it (looking at the docs I don't think I have much in that index anyways)?

If this is all 100% unrecoverable I would need to just reinstall everything (right?) but I would welcome the learning experience of trying to fix this (and also to add another node to another server).

Thanks in advance!

Yes you can delete this index, just stop Kibana to do it and then it'll be recreated on start.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.