Hello,
Hope everyone is doing great.
We had a power outage and our ES server went down. When the server went back online I noticed ES status on red:
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 26,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 16.129032258064516
Checked status:
index shard prirep state node unassigned.reason
index_expedients 1 r UNASSIGNED CLUSTER_RECOVERED
index_expedients 3 r UNASSIGNED CLUSTER_RECOVERED
index_expedients 4 r UNASSIGNED CLUSTER_RECOVERED
index_expedients 2 p UNASSIGNED ALLOCATION_FAILED
index_expedients 2 r UNASSIGNED CLUSTER_RECOVERED
index_expedients 0 r UNASSIGNED CLUSTER_RECOVERED
index_customers 1 p UNASSIGNED ALLOCATION_FAILED
index_customers 1 r UNASSIGNED CLUSTER_RECOVERED
index_customers 3 p UNASSIGNED ALLOCATION_FAILED
index_customers 3 r UNASSIGNED CLUSTER_RECOVERED
index_customers 2 p UNASSIGNED ALLOCATION_FAILED
index_customers 2 r UNASSIGNED CLUSTER_RECOVERED
index_customers 4 p UNASSIGNED ALLOCATION_FAILED
index_customers 4 r UNASSIGNED CLUSTER_RECOVERED
index_customers 0 p UNASSIGNED ALLOCATION_FAILED
index_customers 0 r UNASSIGNED CLUSTER_RECOVERED
index_general 1 p UNASSIGNED ALLOCATION_FAILED
index_general 1 r UNASSIGNED CLUSTER_RECOVERED
index_general 3 p UNASSIGNED ALLOCATION_FAILED
index_general 3 r UNASSIGNED CLUSTER_RECOVERED
index_general 2 p UNASSIGNED ALLOCATION_FAILED
index_general 2 r UNASSIGNED CLUSTER_RECOVERED
index_general 4 p UNASSIGNED ALLOCATION_FAILED
index_general 4 r UNASSIGNED CLUSTER_RECOVERED
index_general 0 p UNASSIGNED ALLOCATION_FAILED
index_general 0 r UNASSIGNED CLUSTER_RECOVERED
index_expedients 1 p STARTED SPANBIWEB02
index_expedients 3 p STARTED SPANBIWEB02
index_expedients 4 p STARTED SPANBIWEB02
index_expedients 0 p STARTED SPANBIWEB02
.kibana 0 p STARTED SPANBIWEB02
Im getting this error at the primary shards that have failed Allocation:
{
"index": "index_expedients",
"shard": 2,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2022-09-23T15:32:36.195Z",
"failed_allocation_attempts": 5,
"details": "failed shard on node [hfFii1P3QxKPvijusWAExA]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: EOFException[read past EOF. pos [1973] length: [4] end: [1973]]; ",
"last_allocation_status": "no"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions": [
{
"node_id": "hfFii1P3QxKPvijusWAExA",
"node_name": "SPANBIWEB02",
"transport_address": "127.0.0.1:9300",
"node_attributes": {
"ml.machine_memory": "17179398144",
"xpack.installed": "true",
"ml.max_open_jobs": "20",
"ml.enabled": "true"
},
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "G30Aru58QhWc7kY2hrNecw"
},
"deciders": [
{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2022-09-23T15:32:36.195Z], failed_attempts[5], delayed=false, details[failed shard on node [hfFii1P3QxKPvijusWAExA]: shard failure, reason [failed to recover from translog], failure EngineException[failed to recover from translog]; nested: EOFException[read past EOF. pos [1973] length: [4] end: [1973]]; ], allocation_status[deciders_no]]]"
}
]
}
]
}
Unfortunately I don't have a snapshot to recover. Is there a way to recover these shards?
Regards,