Fresh cluster all shards are unavailable

dg_hivebrite · December 2, 2019, 4:31pm

Hi,

When my cluster is started, his status is in yellow:

{
"cluster_name" : "datawarehouse",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 0,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 9,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 0.0
}

Also in parallel I check the kibana logs (because the service can't start), the error is:

{"type":"log","@timestamp":"2019-12-02T15:58:39Z","tags":["security","error"],"pid":6,"message":"Error registering Kibana Privileges with Elasticsearch for kibana-.kibana: [unavailable_shards_exception] at least one primary shard for the index [.security-7] is unavailable"}

I check my shards:

curl -k -u "elastic:xxxxx" "https://datawarehouse-es-http:9200/_c
at/shards?h=index,shard,prirep,state,unassigned.reason" --silent
.security-7 0 p UNASSIGNED INDEX_CREATED
.kibana_task_manager_1 0 p UNASSIGNED INDEX_CREATED
.kibana_task_manager_1 0 r UNASSIGNED INDEX_CREATED
.kibana_1 0 p UNASSIGNED INDEX_CREATED
.kibana_1 0 r UNASSIGNED INDEX_CREATED
.apm-agent-configuration 0 p UNASSIGNED INDEX_CREATED
.apm-agent-configuration 0 r UNASSIGNED INDEX_CREATED

Then I check the cluster allocation:

curl -k -u "elastic:xxxxx" "https://datawarehouse-es-http:9200/_c
luster/allocation/explain?pretty"
{
"index" : ".kibana_1",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2019-12-02T15:34:31.969Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"
}

Also I found this error in the elasticsearch logs: org.elasticsearch.action.UnavailableShardsException: at least one primary shard for the index [.security-7] is unavailable

I check the cluster settings:

curl -k -u "elastic:xxxxx" https://datawarehouse-es-http:9200/_cl
uster/settings?pretty
{
"persistent" : { },
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"exclude" : {
"_name" : "none_excluded"
}
}
}
}
}
}

ECK version: 1.0.0-beta
elasticsearch version: 7.4.2
elasticsearch config: eck elastic config · GitHub (it's the config of 1 nodes)
cluster: 3 master & 3 data nodes

I tried to boot a new cluster, the problem persists

If someone can help me to understand / fix the issue it would be awesome.

Anya_Sabo · December 2, 2019, 7:01pm

What is the status of the ES resources? kubectl get elasticsearch (or describe)

Also, the explain API can provide useful information for why shards are not allocated:

https://www.elastic.co/guide/en/elasticsearch/reference/6.0/cluster-allocation-explain.html#_explain_api_response

sebgl · December 3, 2019, 8:06am

@dg_hivebrite are you using PersistentVolumes? Can you share the Elasticsearch yaml manifest?
I'm wondering if one of your volume hosting data has been lost.

dg_hivebrite · December 3, 2019, 10:38am

@Anya_Sabo the kubectl get elasticsearch:

NAME HEALTH NODES VERSION PHASE AGE
datawarehouse yellow 6 7.4.2 Ready 19h

And the describe: es_decribe.yaml · GitHub

dg_hivebrite · December 3, 2019, 10:47am

@sebgl
Yes I'm using persitent volume. Acually the cluser is managed in GKE.
The output of my pv:

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-022edc16-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-master-europe-west1-a-0 standard 19h
pvc-02dc04aa-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-master-europe-west1-b-0 standard 19h
pvc-03745ee2-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-master-europe-west1-c-0 standard 19h
pvc-03ed030a-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-data-europe-west1-a-0 standard 19h
pvc-04639049-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-data-europe-west1-b-0 standard 19h
pvc-04da5b0d-1519-11ea-9b78-4201c0a8000a 10Gi RWO Delete Bound default/elasticsearch-data-datawarehouse-es-data-europe-west1-c-0 standard 19h

You want the yaml file before it send to kubernetes or an output of an object of kubernetes ?

dg_hivebrite · December 3, 2019, 11:02am

I don't understand why on the allocation explain call there is no nodes:

curl "https://datawarehouse-es-http:9200/_cluster/alloc
ation/explain?pretty&include_disk_info=true&include_yes_decisions=true"
{
"index" : ".kibana_1",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2019-12-02T15:34:31.969Z",
"last_allocation_status" : "no_attempt"
},
"cluster_info" : {
"nodes" : { },
"shard_sizes" : { },
"shard_paths" : { }
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes"
}

sebgl · December 3, 2019, 12:13pm

In the ES config I can see:

cluster.routing.allocation.awareness.attributes:  all

The value here should match one of the existing node attribute. Based on the rest of the Elasticsearch spec I can see you're using the attribute zone to distinguish group of nodes.
You probably need to change the configuration to:

cluster.routing.allocation.awareness.attributes:  zone

dg_hivebrite · December 3, 2019, 1:28pm

Ok, so I tried the change it doesn't change anything.
Then I trash my cluster and reboot a new one without cluster.routing.allocation.awareness.attributes, the cluster is yellow with the same issue

sebgl · December 3, 2019, 1:43pm

Looking at your cluster again: can you double check it has at least one data node?
I can see 3 master nodes, and another master node with:

 Config:
      cluster.routing.allocation.awareness.attributes:  all
      node.attr.zone:                                   europe-west1-a
      node.data:                                        false
      node.master:                                      true
    Count:                                              1
    Name:                                               data-europe-west1-a

which I guess was intended to have node.data: true looking at its name?

dg_hivebrite · December 3, 2019, 2:34pm

Oh yes good catch, it was that the issue thank you very much

Topic		Replies	Views
Elasticsearch red status Elasticsearch	6	477	July 6, 2017
Unassigned_shards problam Elasticsearch	21	747	July 6, 2017
Unassigned shards Elasticsearch	3	461	July 6, 2017
Oops! SearchPhaseExecutionException[Failed to execute phase [query], all shards failed] Elasticsearch	5	2506	July 6, 2017
Red status unassigned shards help Elasticsearch	8	564	July 6, 2017

Fresh cluster all shards are unavailable

Related topics