We are using ElasticSearch version 6.2.4 deployed on kubernetes, with 2 data shards, 3 masters, and 1 replica per shard.
All of our shards are currently unassigned and unable to be reassigned.
curl -X GET "http://$ES_HOST/_cluster/health?pretty"
{
"cluster_name" : "es-cluster-dogfood",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 20,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 0.0
}
It seems that both of our data nodes restarted within a few minutes of each other.
Logs from the Master:
2019-04-29 11:13:35.474 PDT[2019-04-29T18:13:35,474][INFO ][o.e.c.r.a.AllocationService] [deployment-es-master-dogfood-f657f77d5-ldbbb] Cluster health status changed from [GREEN] to [YELLOW] (reason: [{deployment-es-master-dogfood-f657f77d5-8msdl}{hmLZaF8BSl-wadr7XCGhyw}{uvBJAIuZT2-rlY4TxgvLHA}{10.20.153.8}{10.20.153.8:9300} transport disconnected, {statefulset-es-data-1}{-anHNjfhSHCn_m3mF731uQ}{Wo4RU_3vQyy0dVaN916M2Q}{10.20.153.9}{10.20.153.9:9300} transport disconnected]).
2019-04-29 11:16:08.067 PDT[2019-04-29T18:16:08,066][INFO ][o.e.c.r.a.AllocationService] [deployment-es-master-dogfood-f657f77d5-ldbbb] Cluster health status changed from [YELLOW] to [RED] (reason: [{statefulset-es-data-0}{VDpIAAVwQ-yvXjBw4Kvulg}{DbCjdJOvSqqmaWvYdAs3Kg}{10.20.164.3}{10.20.164.3:9300} left]).
This corresponds to logs from each data node, showing it re-initializing at these same times. Now all of our shards will not reassign to nodes. explain on one of our indices show this:
{
"index" : "auditlog",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2019-04-29T18:16:08.061Z",
"details" : "node_left[VDpIAAVwQ-yvXjBw4Kvulg]",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions" : [
{
"node_id" : "Vb4Su_pxTa2IaZ4F4FPDxw",
"node_name" : "statefulset-es-data-1",
"transport_address" : "10.20.153.9:9300",
"node_decision" : "no",
"store" : {
"found" : false
}
},
{
"node_id" : "pIYGxPfuTOqHxMxPDyiccg",
"node_name" : "statefulset-es-data-0",
"transport_address" : "10.20.164.6:9300",
"node_decision" : "no",
"store" : {
"found" : false
}
}
]
}
Even using allocate_stale_primary
does not seem to work. This request:
curl -X POST "http://$ES_HOST/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d'
{
"commands": [{
"allocate_stale_primary": {
"index": "counter",
"shard": 0,
"node": "statefulset-es-data-0",
"accept_data_loss": true
}
}]
}
'
Shows this failure when trying to reallocate
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2019-04-30T01:05:32.469Z",
"failed_attempts" : 3,
"delayed" : false,
"details" : "failed shard on node [pIYGxPfuTOqHxMxPDyiccg]: failed recovery, failure RecoveryFailedException[[counter][0]: Recovery failed on {statefulset-es-data-0}{pIYGxPfuTOqHxMxPDyiccg}{bTGu_2LQRZebahRunoedcw}{10.20.164.6}{10.20.164.6:9300}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(MMapDirectory@/data/data/nodes/0/indices/vNEkqOlKQPKFZ1kkIEAohw/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@530e9c6e): files: []]; ",
"allocation_status" : "no_valid_shard_copy"
}
It seems like we incurred some sort of data corruption by having both of our data nodes going down at the same time? Is there anything else I can do to debug this or try to get these indices back up? We are OK with some data loss, but would prefer not to lose our entire index. We are also quite concerned about the stability and resiliency of the system, if every time a shard has all its replicas go down concurrently, there is corruption like this.
Thanks