Primary shard not available

amitavmohanty01 · December 20, 2018, 11:32am

Hi

I am facing issues because the primary shard is not available. What could be the reason for it? Please suggest.

Caused by: org.elasticsearch.action.UnavailableShardsException: [.monitoring-kibana-6-2018.12.20][0] primary shard is not active
Timeout: [1m], request: [BulkShardRequest [[.monitoring-kibana-6-2018.12.20][0]] containing [index {[.monitoring-kibana-6-2018.12
.20][doc][KbugymcBNIueI7r1ErSY], source[{"cluster_uuid":"lqoa3HJhRPyUAQUkLgFeqw","timestamp":"2018-12-20T07:59:07.413Z","interval
ms":10000,"type":"kibana_stats","source_node":{"uuid":"CiXHakymT_6VN-KMuIKsfQ","host":"myhost","tran
sport_address":"10.240.0.14:9401","ip":"10.240.0.14","name":"elkkibana01.pod","timestamp":"2018-12-20T07:59:07.413Z"},"kibana
stats":{"kibana":{"uuid":"1a54c2b0-78e8-4d76-9dc3-6dd93e9a8f67","name":"myhost","index":".kibana","ho
st":"0","transport_address":"0:9018","version":"6.4.2","snapshot":false,"status":"green"},"usage":{"xpack":{"reporting":{"availab
le":true,"enabled":true,"browser_type":"phantom","_all":0,"csv":{"available":true,"total":0},"printable_pdf":{"available":false,"
total":0},"status":{},"lastDay":{"_all":0,"csv":{"available":true,"total":0},"printable_pdf":{"available":false,"total":0},"statu
s":{}},"last7Days":{"_all":0,"csv":{"available":true,"total":0},"printable_pdf":{"available":false,"total":0},"status":{}}}}}}}]}
]]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(Transport
ReplicationAction.java:927) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportRepli
cationAction.java:773) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:726) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:887) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:573) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) ~[elasticsearch-6.4.2.jar:6.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
... 1 more

Regards,

DavidTurner · December 20, 2018, 1:55pm

This can be answered with the allocation explain API:

GET /_cluster/allocation/explain
{
  "index": ".monitoring-kibana-6-2018.12.20",
  "shard": 0,
  "primary": true
}

If the result of that call is unclear, please share it here for further help.

amitavmohanty01 · December 20, 2018, 3:33pm

Thank you for the reply. The explanation I got is as follows:

"reached the limit of ongoing initial primary recoveries [4], cluster setting [cluster.routing.allocation.node_initial_primaries_recoveries=4]"

I am considering increasing the recoveries count. What would be the repercussions of it?

DavidTurner · December 20, 2018, 3:42pm

Running too many recoveries in parallel can result in some or all of them timing out and failing, and can consume too many cluster resources such as bandwidth.

However, primary recoveries are normally quite quick, so I'm a bit surprised that your cluster is stuck here. I'm curious about what other recoveries are taking place preventing this one. Can you share the output of GET _cat/recovery?

amitavmohanty01 · December 31, 2018, 8:08am

https://dpaste.de/MPT6 is the output.

DavidTurner · December 31, 2018, 8:34am

I do not see any incomplete recoveries in that output.

However, you seem to have far too many shards for the amount of data you are dealing with. You have multiple daily indices, each with 15 shards, with many shards are smaller than 1MB in size and no shards larger than 400MB. This will certainly have an impact on your cluster performance. This article gives advice on sharding, but the main point is you should aim for shards to be around 40GB in size. I think you could reasonably reduce the number_of_shards parameter to 1 on all these indices, and extend some of the daily indices to be weekly or monthly instead.

amitavmohanty01 · December 31, 2018, 10:04am

I am working on your advice about the shards. However, my primary question is that if there are no incomplete recoveries why the exception about primary shard not being available. I got a repro of the problem today so I posted the details.

DavidTurner · December 31, 2018, 10:19am

Ok, a bit more context would help! It wasn't clear that there was any issue, and remains unclear that it's still the same issue.

Which shard is reported as unavailable? What does the allocation explain API say about it?

system · January 28, 2019, 10:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CLUSTER_RECOVERED and unassigned Elasticsearch	1	341	December 11, 2019
Unassigned primary shards not found Elasticsearch	1	382	July 5, 2017
Unassigned shard with inconsistent primary state and doc count differences Elasticsearch	17	1181	November 14, 2019
At least one primary shard for the index [.security-7] is unavailable Elasticsearch	5	5797	October 26, 2020
Why replica shard is not allocated Elasticsearch	17	2612	February 14, 2021

Primary shard not available

Related topics