Primary shard is not active or isn't assigned to a known node

small-tomorrow · October 25, 2017, 4:14am

My master node keep logging:

index [.marvel-es-2017.10.25], type [index_stats], id [AV9RswFhsIL8o1ZCN3Mi], message [UnavailableShardsException[[.marvel-es-2017.10.25][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4f100b34]]]

This happened after I restarted my cluster because of cluster crash. Now my cluster health is as follow :

{
"cluster_name": "es-online",
"status": "red",
"timed_out": false,
"number_of_nodes": 48,
"number_of_data_nodes": 47,
"active_primary_shards": 28186,
"active_shards": 56372,
"relocating_shards": 2,
"initializing_shards": 1,
"unassigned_shards": 1,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 99.99645226522865
}

I have tried (but still with the error log):
1、delete the .marvel-es-2017.10.25 , but it was created automaticlly !
2、set cluster.routing.allocation.disk.watermark to a prefer value
3、all my nodes's disk have enough space
4、reroute the .marvel-es-2017.10.25 to another

Any advice , help me please

warkolm · October 25, 2017, 4:17am

You have too many shards for your cluster, that won't be causing this but it's not helping. You should also upgrade from 2.X as 6.0 will be out very soon.

If the primary and replica are not assigned then you can try to reallocate/reroute the primary, but that may cause data loss. You may just want to delete the index and let it be recreated, which means losing a bit of monitoring data.

Also, please don't post pictures of text, they are difficult to read and some people may not be even able to see them

warkolm · October 25, 2017, 4:21am

How many master nodes do you have?

small-tomorrow · October 25, 2017, 4:24am

tks, warkolm ! upgrading is a nice option ,but I have to fix this problem first , and when I try to reroute the primary , it says No allocation command factory registered for name [allocate_replica]

small-tomorrow · October 25, 2017, 4:25am

I have two master node , and one is data node and another is not

warkolm · October 25, 2017, 4:25am

You will probably need to delete the index then sorry to say.

That is really bad, especially for a cluster of this size. See Important Configuration Changes | Elasticsearch: The Definitive Guide [2.x] | Elastic

small-tomorrow · October 25, 2017, 4:31am

when I try to delete the index, it is just created automatically , and I don't see auto_create_index is available in 2.x docs . It there another way to disable marvel's index to be created automatically ?

warkolm · October 25, 2017, 4:36am

I'd expect it to be recreated. But is it assigning even after being deleted?

small-tomorrow · October 25, 2017, 5:19am

Yes! after I deleted it , the cluster turned green before the marvel index be auto-created !

warkolm · October 25, 2017, 5:31am

It's probably the high shard count then.

Can you delete any older indices?

small-tomorrow · October 25, 2017, 5:32am

I have deleted indices older than 30 days , and there are still 40T space left ,and left data take 20T

warkolm · October 25, 2017, 5:33am

And you still have that many shards?

small-tomorrow · October 25, 2017, 5:40am

that many shards are left . yes , still , I can post new index into the cluster , and it works normally , but sometimes , new index will cause the same problems as .marvel-es-2017.10.25

"active_primary_shards": 28186,
"active_shards": 56372,

small-tomorrow · October 31, 2017, 3:55am

Hi, warkolm ,after I delete all the marvel's indices. it turned green, but it won't be able to index data now

{
   "cluster_name": "es-online",
   "status": "green",
   "timed_out": false,
   "number_of_nodes": 48,
   "number_of_data_nodes": 46,
   "active_primary_shards": 27946,
   "active_shards": 55892,
   "relocating_shards": 2,
   "initializing_shards": 0,
   "unassigned_shards": 0,
   "delayed_unassigned_shards": 0,
   "number_of_pending_tasks": 0,
   "number_of_in_flight_fetch": 0,
   "task_max_waiting_in_queue_millis": 0,
   "active_shards_percent_as_number": 100
}

when I try to index a doc

POST /test/index2
{
"help":"plz"
}

it says :

[2017-10-31 11:52:12,610][INFO ][rest.suppressed          ] /test/index2 Params: {index=test, type=index2}
UnavailableShardsException[[test][3] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: index {[test][index2][AV9wjfXh_Ywzw8X7NZuQ], source[{
    "help":"plz"
}
]}]
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.retryBecauseUnavailable(TransportReplicationAction.java:660)
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:378)
       	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$3.onTimeout(TransportReplicationAction.java:520)
       	at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
       	at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:574)
       	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       	at java.lang.Thread.run(Thread.java:745)

have you faced this problem ?

warkolm · October 31, 2017, 3:59am

This is still related to you having too many shards, you need to reduce them.

small-tomorrow · October 31, 2017, 4:06am

alright , I will try to reduce

small-tomorrow · November 1, 2017, 12:56pm

Hi, warkolm , now ， I can index , but still another problem that primary shards will be allocated successfully, but replicas not .

{
"_index": "test_666",
"_type": "index12",
"id": "AV93n1AWDRFVlqgJCqc",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

Master log :

[2017-11-01 20:47:11,021][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [create-index [test_666], cause [auto(index api)]]: execute
[2017-11-01 20:47:11,022][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] creating Index [test_666], shards [1]/[1]
[2017-11-01 20:47:11,029][DEBUG][index.store              ] [es-nmg02-ecom-jpaas049] [test_666] using index.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[2017-11-01 20:47:11,030][DEBUG][index.mapper             ] [es-nmg02-ecom-jpaas049] [test_666] using dynamic[true]
[2017-11-01 20:47:11,036][INFO ][cluster.metadata         ] [es-nmg02-ecom-jpaas049] [test_666] creating index, cause [auto(index api)], templates [template_test], shards [1]/[1], mappings [index12, test_index]
[2017-11-01 20:47:13,296][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing ... (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,296][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index cache (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][index.cache.query.index  ] [es-nmg02-ecom-jpaas049] [test_666] full cache clear, reason [close]
[2017-11-01 20:47:13,297][DEBUG][index.cache.bitset       ] [es-nmg02-ecom-jpaas049] [test_666] clearing all bitsets because [close]
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] clearing index field data (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing analysis service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing mapper service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index query parser service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,297][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closed... (reason [cleaning up after validating index on master])
[2017-11-01 20:47:13,324][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118952], source [create-index [test_666], cause [auto(index api)]]
[2017-11-01 20:47:13,325][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118952
[2017-11-01 20:47:13,417][DEBUG][cluster.action.shard     ] [es-nmg02-ecom-jpaas049] received shard started for [test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]], indexUUID [h3SmdYMBQUO3nQivMRcqSQ], message [after recovery from store], failure [Unknown]
[2017-11-01 20:47:23,328][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118952
[2017-11-01 20:47:23,393][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:23,393][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:30,327][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [create-index [test_666], cause [auto(index api)]]: took 19.3s done applying updated cluster_state (version: 118952, uuid: f6K3vyBCRUuCTbhwVtEFDw)
[2017-11-01 20:47:30,327][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]: execute
[2017-11-01 20:47:32,746][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118953], source [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]
[2017-11-01 20:47:32,746][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118953
[2017-11-01 20:47:42,738][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118953

small-tomorrow · November 1, 2017, 12:56pm

still , the log :

[2017-11-01 20:47:42,832][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:42,832][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:49,789][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [shard-started ([test_666][0], node[PVKsxzxsRpKLQTrkymbO8g], [P], v[1], s[INITIALIZING], a[id=Tw6eA1HdQLqZOpIyucR5Mw], unassigned_info[[reason=INDEX_CREATED], at[2017-11-01T12:47:11.037Z]]), reason [after recovery from store]]: took 19.4s done applying updated cluster_state (version: 118953, uuid: CDY0knCEQtq-9DyPf4_mAg)
[2017-11-01 20:47:49,789][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [put-mapping [index12]]: execute
[2017-11-01 20:47:49,789][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] creating Index [test_666], shards [1]/[1]
[2017-11-01 20:47:49,795][DEBUG][index.store              ] [es-nmg02-ecom-jpaas049] [test_666] using index.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[2017-11-01 20:47:49,797][DEBUG][index.mapper             ] [es-nmg02-ecom-jpaas049] [test_666] using dynamic[true]
[2017-11-01 20:47:49,799][DEBUG][cluster.metadata         ] [es-nmg02-ecom-jpaas049] [test_666] update_mapping [index12] with source [{"index12":{"properties":{"god":{"type":"string","index":"not_analyzed"}}}}]
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing ... (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index cache (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][index.cache.query.index  ] [es-nmg02-ecom-jpaas049] [test_666] full cache clear, reason [close]
[2017-11-01 20:47:49,803][DEBUG][index.cache.bitset       ] [es-nmg02-ecom-jpaas049] [test_666] clearing all bitsets because [close]
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] clearing index field data (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing analysis service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing mapper service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index query parser service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closing index service (reason [created for mapping processing])
[2017-11-01 20:47:49,803][DEBUG][indices                  ] [es-nmg02-ecom-jpaas049] [test_666] closed... (reason [created for mapping processing])
[2017-11-01 20:47:49,806][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] cluster state updated, version [118954], source [put-mapping [index12]]
[2017-11-01 20:47:49,806][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] publishing cluster state version 118954
[2017-11-01 20:47:50,307][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] set local cluster state to version 118954
[2017-11-01 20:47:50,356][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] previous [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:50,356][DEBUG][license.plugin.core      ] [es-nmg02-ecom-jpaas049] current [{"uid":"f682f45b-522f-4dde-8b94-59fc02d4e271","type":"basic","issue_date_in_millis":1509408000000,"expiry_date_in_millis":1541030399999,"max_nodes":100,"issued_to":"steven dullian (BD)","issuer":"Web Form","signature":"xx"}]
[2017-11-01 20:47:50,506][DEBUG][cluster.service          ] [es-nmg02-ecom-jpaas049] processing [put-mapping [index12]]: took 717ms done applying updated cluster_state (version: 118954, uuid: 0ZV2kG05SJG6p42IGYsDDQ)

warkolm · November 1, 2017, 10:52pm

That's likely the cause of the problem as it's a massively long time

How many shards do you have now?

small-tomorrow · November 2, 2017, 6:43am

I delete almost half , now the number is 30 thousands more or less . and from I send indexing request to receiving a response it costs several seconds instead of 717ms

Topic		Replies	Views
Elasticsearch issue Elasticsearch	12	2107	February 11, 2014
Permanent unassigned shards in latest logstash index Elasticsearch	8	729	April 4, 2014
Slow Shard Assignment Elasticsearch	5	1847	January 14, 2013
New index immediately becomes red Elasticsearch	7	2112	November 21, 2013
Unassigned primary and replica shards in marvel indexes Elasticsearch elastic-stack-monitoring	4	2200	October 11, 2015

Primary shard is not active or isn't assigned to a known node

Related topics