Hi, I'm running a 3 node elasticsearch 5.0.0 cluster and restarted a node in the cluster for some maintenance. Since that restart, there is a set of shards that seem to not be allocating on node with valid copies. I have tried restarting each node in the cluster, but each time a new subset of shards gets into the same state.
First I see the general state of the cluster and shards:
$ curl -s 'http://localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "demo",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 200,
"active_shards" : 399,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 54,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 88.0794701986755
}
Then I get a list of unassigned shards:
$ curl -s 'http://localhost:9200/_cat/shards?pretty' | grep UNASSIGNED
...
54 lines
...
Next I take one of those shards and check the explanation API:
$ curl 'https://localhost:9200/_cluster/allocation/explain?pretty' -d '{
"index": "foo",
"shard": 0,
"primary": true
}'
{
"shard": {
"index": "foo",
"index_uuid": "idewF7BEQ9yVF1H9nfC6qg",
"id": 0,
"primary": true
},
"assigned": false,
"shard_state_fetch_pending": false,
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2017-03-29T20:05:35.309Z",
"failed_attempts": 1,
"delayed": false,
"details": "master marked shard as active, but shard has not been created, mark shard as failed",
"allocation_status": "no_valid_shard_copy"
},
"allocation_delay_in_millis": 60000,
"remaining_delay_in_millis": 0,
"nodes": {
"CXxx0SNOTGeCXCrrD9XWvg": {
"node_name": "foo",
"node_attributes": {
},
"store": {
"shard_copy": "STALE"
},
"final_decision": "NO",
"final_explanation": "the copy of the shard is stale, allocation ids do not match",
"weight": 8.099999,
"decisions": []
},
"-5m_OHBRTva2Br3mZSAR7A": {
"node_name": "bar",
"node_attributes": {
},
"store": {
"shard_copy": "NONE"
},
"final_decision": "NO",
"final_explanation": "there is no copy of the shard available",
"weight": 8.65,
"decisions": []
},
"2aIJFALXS320jeBCPU36Dw": {
"node_name": "baz",
"node_attributes": {
},
"store": {
"shard_copy": "AVAILABLE"
},
"final_decision": "YES",
"final_explanation": "the shard can be assigned and the node contains a valid copy of the shard data",
"weight": 8.65,
"decisions": []
}
}
}
Note that the "unassigned_info" has failure info, while node "2aIJFALXS320jeBCPU36Dw" clearly says that is is ready and available.
I've tried requesting a retry for allocations with:
curl -s -X POST 'https://localhost:9200/_cluster/reroute?retry_failed=true&pretty'
...
a lot of state
...
"acknowledged": true
I've also tried requesting an allocate_stale_primary
command on the node with the good data, with no results, but a positive "acknowledged".
I've also made sure the settings allow for full flexibility in allocations:
$ curl 'https://localhost:9200/_cluster/settings?pretty'
{
"persistent" : { },
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"allow_rebalance" : "always",
"enable" : "all"
}
}
}
}
}
EDIT: I've also made sure that each node is well below the threshold for allocations. Each is sitting at 12% disk usage.
Thanks for any advice