Hello, I am new to elastic cluster, sorry for the long post and I am looking for help to revive some unassigned shards which are a result of me abruptly shutting the master node. This has an impact on Kibana as it has problem reaching to Elasticsearch service. I have 3 master nodes and 3 data nodes. These are on Kubernetes cluster. So, it uses persistent volumes(volumes that can retain information even after the restarts). All was good until 2 weeks ago. Due to few reasons, I had to restart the master and data nodes which resulted in a mess of volumes i.e, one of the data node and the master node failed to recognize the volumes. I assume it is to do with my way of restarting these nodes. Fast forward -> I was able to provide the volumes for these master and data nodes and I am trying to bring up the cluster but it fails.
Here are the checks that I did based on the previous posts and the blog links
The health on the cluster when executed against shards gives this result
Executing curl at _cluster/health?filter_path=status,*_shards
{
"status": "red",
"active_primary_shards": 5,
"active_shards": 10,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 10,
"delayed_unassigned_shards": 0
}
So, there are 10 unassigned shards. Looking into what are unassigned
_cat/shards?h=index,shard,prirep,state,unassigned.reason
logstash-2022.02.21 0 p STARTED
logstash-2022.02.21 0 r STARTED
.monitoring-es-7-2022.02.22 0 p STARTED
.monitoring-es-7-2022.02.22 0 r STARTED
.kibana_2 0 p STARTED
.kibana_2 0 r STARTED
.kibana_task_manager_1 0 r UNASSIGNED REPLICA_ADDED
.kibana_task_manager_1 0 p UNASSIGNED CLUSTER_RECOVERED
logstash-2022.02.22 0 r STARTED
logstash-2022.02.22 0 p STARTED
.security-7 0 p UNASSIGNED CLUSTER_RECOVERED
.security-7 0 r UNASSIGNED REPLICA_ADDED
.monitoring-es-7-2022.02.21 0 p STARTED
.monitoring-es-7-2022.02.21 0 r STARTED
.tasks 0 r UNASSIGNED REPLICA_ADDED
.tasks 0 p UNASSIGNED CLUSTER_RECOVERED
.apm-agent-configuration 0 p UNASSIGNED CLUSTER_RECOVERED
.apm-agent-configuration 0 r UNASSIGNED REPLICA_ADDED
.kibana_1 0 r UNASSIGNED REPLICA_ADDED
.kibana_1 0 p UNASSIGNED CLUSTER_RECOVERED
Looking at the 'started' indices, they are all on data nodes and I do not see any indices on master node now. I do believe they should have been there before I did changes. So, I do not even know to which nodes of master are these indices are related to.
Couple of other troubleshooting over the these unassigned shards -
The part of the output of _cluster/reroute?retry_failed=true and it does not help as well.
( I did not paste whole output here as it is too long)
{
"acknowledged": true,
"state": {
"cluster_uuid": "EQQ472KzSX6QcuQCs3jRuw",
"version": 30843835,
"state_uuid": "pKPDGAXlRhmIV-6RIUtYmg",
"master_node": "QfuHnzYfRS-e8hgkrLTQlQ",
"blocks": {},
"nodes": {
"6zS21m_HQFaBJolZQdEj7g": {
"name": "elastic-cluster-es-data-1",
"ephemeral_id": "ZoB5E487SYidstzd0J-fjA",
"transport_address": "10.233.93.153:9300",
"attributes": {
"xpack.installed": "true"
}
},
"QfuHnzYfRS-e8hgkrLTQlQ": {
"name": "elastic-cluster-es-master-0",
"ephemeral_id": "ipSIsbcFTpuPgTvJ9JMZbw",
"transport_address": "10.233.101.182:9300",
"attributes": {
"xpack.installed": "true"
}
},
................
}
}
},
"routing_table": {
"indices": {
".security-7": {
"shards": {
"0": [{
"state": "UNASSIGNED",
"primary": true,
"node": null,
"relocating_node": null,
"shard": 0,
"index": ".security-7",
"recovery_source": {
"type": "EXISTING_STORE",
"bootstrap_new_history_uuid": false
},
"unassigned_info": {
"reason": "CLUSTER_RECOVERED",
"at": "2022-02-21T18:15:40.290Z",
"delayed": false,
"allocation_status": "no_valid_shard_copy"
}
}, {
"state": "UNASSIGNED",
"primary": false,
"node": null,
"relocating_node": null,
"shard": 0,
"index": ".security-7",
"recovery_source": {
"type": "PEER"
},
"unassigned_info": {
"reason": "REPLICA_ADDED",
"at": "2022-02-21T18:17:23.298Z",
"delayed": false,
"allocation_status": "no_attempt"
}
.............
When I do shard stores
_shard_stores?pretty
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
Same is the result for _cat/nodes?v
Before executing the below, Checking on master nodes I do not find any indices in /usr/share/Elasticsearch/nodes/ but there were indices on data nodes. Now, after executing the command, I do not find anything under nodes folder. Not sure if the reroute command cleaned everything.
_cluster/reroute(according to Fix common cluster issues | Elasticsearch Guide [master] | Elastic)
I do not know what kind of issue I've gotten to, but any help is greatly appreciated.