Primary shard is not active or isn't assigned to a known node

My master node keep logging:

index [.marvel-es-2017.10.25], type [index_stats], id [AV9RswFhsIL8o1ZCN3Mi], message [UnavailableShardsException[[.marvel-es-2017.10.25][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4f100b34]]]

This happened after I restarted my cluster because of cluster crash. Now my cluster health is as follow :

{
"cluster_name": "es-online",
"status": "red",
"timed_out": false,
"number_of_nodes": 48,
"number_of_data_nodes": 47,
"active_primary_shards": 28186,
"active_shards": 56372,
"relocating_shards": 2,
"initializing_shards": 1,
"unassigned_shards": 1,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 99.99645226522865
}

I have tried (but still with the error log):
1、delete the .marvel-es-2017.10.25 , but it was created automaticlly !
2、set cluster.routing.allocation.disk.watermark to a prefer value
3、all my nodes's disk have enough space
4、reroute the .marvel-es-2017.10.25 to another

Any advice , help me please

You have too many shards for your cluster, that won't be causing this but it's not helping. You should also upgrade from 2.X as 6.0 will be out very soon.

If the primary and replica are not assigned then you can try to reallocate/reroute the primary, but that may cause data loss. You may just want to delete the index and let it be recreated, which means losing a bit of monitoring data.

Also, please don't post pictures of text, they are difficult to read and some people may not be even able to see them :slight_smile:

How many master nodes do you have?

tks, warkolm ! upgrading is a nice option ,but I have to fix this problem first , and when I try to reroute the primary , it says No allocation command factory registered for name [allocate_replica]

I have two master node , and one is data node and another is not

You will probably need to delete the index then sorry to say.

That is really bad, especially for a cluster of this size. See Important Configuration Changes | Elasticsearch: The Definitive Guide [2.x] | Elastic

when I try to delete the index, it is just created automatically , and I don't see auto_create_index is available in 2.x docs . It there another way to disable marvel's index to be created automatically ?

I'd expect it to be recreated. But is it assigning even after being deleted?

Yes! after I deleted it , the cluster turned green before the marvel index be auto-created !

It's probably the high shard count then.

Can you delete any older indices?

I have deleted indices older than 30 days , and there are still 40T space left ,and left data take 20T

And you still have that many shards?

that many shards are left . yes , still , I can post new index into the cluster , and it works normally , but sometimes , new index will cause the same problems as .marvel-es-2017.10.25

"active_primary_shards": 28186,
"active_shards": 56372,

Hi, warkolm ,after I delete all the marvel's indices. it turned green, but it won't be able to index data now

{
   "cluster_name": "es-online",
   "status": "green",
   "timed_out": false,
   "number_of_nodes": 48,
   "number_of_data_nodes": 46,
   "active_primary_shards": 27946,
   "active_shards": 55892,
   "relocating_shards": 2,
   "initializing_shards": 0,
   "unassigned_shards": 0,
   "delayed_unassigned_shards": 0,
   "number_of_pending_tasks": 0,
   "number_of_in_flight_fetch": 0,
   "task_max_waiting_in_queue_millis": 0,
   "active_shards_percent_as_number": 100
}

when I try to index a doc

POST /test/index2
{
"help":"plz"
}

it says :

[2017-10-31 11:52:12,610][INFO ][rest.suppressed          ] /test/index2 Params: {index=test, type=index2}
UnavailableShardsException[[test][3] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: index {[test][index2][AV9wjfXh_Ywzw8X7NZuQ], source[{
    "help":"plz"
}
]}]
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.retryBecauseUnavailable(TransportReplicationAction.java:660)
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:378)
       	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$3.onTimeout(TransportReplicationAction.java:520)
       	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
       	at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:574)
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       	at java.lang.Thread.run(Thread.java:745)

have you faced this problem ?

This is still related to you having too many shards, you need to reduce them.

alright , I will try to reduce

Hi, warkolm , now , I can index , but still another problem that primary shards will be allocated successfully, but replicas not .

{
"_index": "test_666",
"_type": "index12",
"id": "AV93n1AWDRFVlqgJCqc",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

Master log :

[2017-11-01 20:47:11,021][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [create-index [test_666], cause [auto(index api)]]: execute
[2017-11-01 20:47:11,022][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] creating Index [test_666], shards [1]/[1]
[2017-11-01 20:47:11,029][DEBUG][index.store              ] [es-nmg02-ecom-jpaas049] [test_666] using index.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[2017-11-01 20:47:11,030][DEBUG][index.mapper             ] [es-nmg02-ecom-jpaas049] [test_666] using dynamic[true]
[2017-11-01 20:47:11,036][INFO ][cluster.metadata         ] [es-nmg02-ecom-jpaas049] [test_666] creating index, cause [auto(index api)], templates [template_test], shards [1]/[1], mappings [index12, test_index]
[2017-11-01 20:47:13,296][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing ... (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,296][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index cache (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][index.cache.query.index  ] [es-nmg02-ecom-jpaas049] [test_666] full cache clear, reason [close]
[2017-11-01 20:47:13,297][DEBUG][index.cache.bitset       ] [es-nmg02-ecom-jpaas049] [test_666] clearing all bitsets because [close]
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] clearing index field data (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing analysis service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing mapper service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index query parser service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closed... (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,324][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118952], source [create-index [test_666], cause [auto(index api)]]
[2017-11-01 20:47:13,325][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118952
[2017-11-01 20:47:13,417][DEBUG][cluster.action.shard     ] [es-nmg02-ecom-jpaas049] received shard started for [test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]], indexUUID [h3SmdYMBQUO3nQivMRcqSQ], message [after recovery from store], failure [Unknown]
[2017-11-01 20:47:23,328][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118952
[2017-11-01 20:47:23,393][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:23,393][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:30,327][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [create-index [test_666], cause [auto(index api)]]: took 19.3s done applying updated cluster_state (version: 118952, uuid: f6K3vyBCRUuCTbhwVtEFDw)
[2017-11-01 20:47:30,327][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]: execute
[2017-11-01 20:47:32,746][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118953], source [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]
[2017-11-01 20:47:32,746][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118953
[2017-11-01 20:47:42,738][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118953

still , the log :

[2017-11-01 20:47:42,832][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:42,832][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:49,789][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]: took 19.4s done applying updated cluster_state (version: 118953, uuid: CDY0knCEQtq-9DyPf4_mAg)
[2017-11-01 20:47:49,789][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [put-mapping [index12]]: execute
[2017-11-01 20:47:49,789][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] creating Index [test_666], shards [1]/[1]
[2017-11-01 20:47:49,795][DEBUG][index.store              ] [es-nmg02-ecom-jpaas049] [test_666] using index.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[2017-11-01 20:47:49,797][DEBUG][index.mapper             ] [es-nmg02-ecom-jpaas049] [test_666] using dynamic[true]
[2017-11-01 20:47:49,799][DEBUG][cluster.metadata         ] [es-nmg02-ecom-jpaas049] [test_666] update_mapping [index12] with source [{"index12":{"properties":{"god":{"type":"string","index":"not_analyzed"}}}}]
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing ... (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index cache (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][index.cache.query.index  ] [es-nmg02-ecom-jpaas049] [test_666] full cache clear, reason [close]
[2017-11-01 20:47:49,803][DEBUG][index.cache.bitset       ] [es-nmg02-ecom-jpaas049] [test_666] clearing all bitsets because [close]
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] clearing index field data (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing analysis service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing mapper service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index query parser service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closed... (reason [created for mapping processing])
[2017-11-01 20:47:49,806][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118954], source [put-mapping [index12]]
[2017-11-01 20:47:49,806][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118954
[2017-11-01 20:47:50,307][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118954
[2017-11-01 20:47:50,356][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:50,356][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:50,506][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [put-mapping [index12]]: took 717ms done applying updated cluster_state (version: 118954, uuid: 0ZV2kG05SJG6p42IGYsDDQ)

That's likely the cause of the problem as it's a massively long time :frowning:

How many shards do you have now?

I delete almost half , now the number is 30 thousands more or less . and from I send indexing request to receiving a response it costs several seconds instead of 717ms