Could not reassign UNASSIGNED shards (elasticsearch 5.6 )


(Serg) #1

After reboot, i got some unassigned shards.

logstash-2017.12.08 3 p UNASSIGNED
logstash-2017.12.08 3 r UNASSIGNED
logstash-2017.12.08 4 p UNASSIGNED
logstash-2017.12.08 4 r UNASSIGNED
logstash-2017.12.08 2 p UNASSIGNED
logstash-2017.12.08 2 r UNASSIGNED
logstash-2017.12.08 1 p UNASSIGNED
logstash-2017.12.08 1 r UNASSIGNED
logstash-2017.12.08 0 p UNASSIGNED
logstash-2017.12.08 0 r UNASSIGNED

now i am trying to assigned it but i got

root@04elasticsearch:~# curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '{ "commands" : [ { "index" : "logstash-2017.12.08", "allocate" : { "shard" : 3, "node" : "7vuMfymHTTOzxlcTtAkk9g", "allow_primary" : true } } ]}' | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 514 100 368 100 146 62542 24813 --:--:-- --:--:-- --:--:-- 73600
{
"status": 400,
"error": {
"caused_by": {
"col": 31,
"line": 1,
"reason": "Unknown AllocationCommand [index]",
"type": "unknown_named_object_exception"
},
"col": 31,
"line": 1,
"reason": "[cluster_reroute] failed to parse field [commands]",
"type": "parsing_exception",
"root_cause": [
{
"col": 31,
"line": 1,
"reason": "Unknown AllocationCommand [index]",
"type": "unknown_named_object_exception"
}
]
}
}


(Serg) #2

After i tried

GET /_cluster/allocation/explain
{
"index": "logstash-2017.12.05",
"shard": 0,
"primary": true
}

"index": "logstash-2017.12.05",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2017-12-25T18:48:27.248Z",
"failed_allocation_attempts": 1,
"details": "failed recovery, failure RecoveryFailedException[[logstash-2017.12.05][0]: Recovery failed on {7vuMfym}{7vuMfymHTTOzxlcTtAkk9g}{e4mS1ZIKS5aJHUA3PHTx3g}{10.3.2.45}{10.3.2.45:9300}{ml.max_open_jobs=10, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: FileNotFoundException[no segments* file found in store(mmapfs(/vol/nodes/0/indices/AgVIeZ2cQji5SSFlnmS8dw/0/index)): files: []]; ",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_id": "7vuMfymHTTOzxlcTtAkk9g",
"node_name": "7vuMfym",
"transport_address": "10.3.2.45:9300",
"node_attributes": {
"ml.max_open_jobs": "10",
"ml.enabled": "true"
},
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "Wx2RRHWjQUWO38H6u8eXnA",
"store_exception": {
"type": "file_not_found_exception",
"reason": "no segments* file found in SimpleFSDirectory@/vol/nodes/0/indices/AgVIeZ2cQji5SSFlnmS8dw/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@4ea20096: files: []"
}
}
},
{
"node_id": "dEcFJ6h_QfyKNr_N7362QQ",
"node_name": "dEcFJ6h",
"transport_address": "10.3.2.39:9300",
"node_attributes": {
"ml.max_open_jobs": "10",
"ml.enabled": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}
]
}


(Serg) #3

Also i tried

POST /logstash-2017.12.05/_close
POST /logstash-2017.12.05/_open

and

PUT /logstash-2017.12.05/_settings
{
"index" : {
"number_of_replicas" : 0
}
}

'+'

PUT /logstash-2017.12.05/_settings
{
"index" : {
"number_of_replicas" : 1
}
}


(Serg) #4

and

POST /_cluster/reroute
{
"commands" : [
{
"move" : {
"index" : "logstash-2017.12.05", "shard" : 0,
"from_node" : "dEcFJ6h_QfyKNr_N7362QQ", "to_node" : "dEcFJ6h_QfyKNr_N7362QQ"
}
},
{
"allocate_replica" : {
"index" : "logstash-2017.12.05", "shard" : 1,
"node" : "dEcFJ6h_QfyKNr_N7362QQ"
}
}
]
}

but in this case i don't know what should i put in "from node" filed i this index not assign to any server
By the way allocation for all indexes are enabled


(andy_zhou) #5

this you can slove it as i know.
i think you cluster have many shards.
i test is in my cluster find the slove it fast is close it and open again.


(Serg) #6

I dont understand , explain please how to fix it


(Christian Dahlqvist) #7

What is the full output of the cluster stats API?


(Serg) #8

{
"_nodes": {
"total": 2,
"successful": 2,
"failed": 0
},
"cluster_name": "prod",
"timestamp": 1514277344003,
"status": "red",
"indices": {
"count": 81,
"shards": {
"total": 409,
"primaries": 205,
"replication": 0.9951219512195122,
"index": {
"shards": {
"min": 1,
"max": 10,
"avg": 5.049382716049383
},
"primaries": {
"min": 1,
"max": 5,
"avg": 2.5308641975308643
},
"replication": {
"min": 0,
"max": 1,
"avg": 0.9876543209876543
}
}
},
"docs": {
"count": 445304538,
"deleted": 26909
},
"store": {
"size": "712.4gb",
"size_in_bytes": 765019382231,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "12.7kb",
"memory_size_in_bytes": 13104,
"evictions": 0
},
"query_cache": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"total_count": 0,
"hit_count": 0,
"miss_count": 0,
"cache_size": 0,
"cache_count": 0,
"evictions": 0
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 8011,
"memory": "1.8gb",
"memory_in_bytes": 2034123647,
"terms_memory": "1.6gb",
"terms_memory_in_bytes": 1733836573,
"stored_fields_memory": "188.9mb",
"stored_fields_memory_in_bytes": 198105344,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "295.4kb",
"norms_memory_in_bytes": 302528,
"points_memory": "15.9mb",
"points_memory_in_bytes": 16733358,
"doc_values_memory": "81.2mb",
"doc_values_memory_in_bytes": 85145844,
"index_writer_memory": "46.1mb",
"index_writer_memory_in_bytes": 48391088,
"version_map_memory": "81.9kb",
"version_map_memory_in_bytes": 83886,
"fixed_bit_set": "72kb",
"fixed_bit_set_memory_in_bytes": 73728,
"max_unsafe_auto_id_timestamp": 1514246409906,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 2,
"data": 2,
"coordinating_only": 0,
"master": 2,
"ingest": 2
},
"versions": [
"5.5.2"
],
"os": {
"available_processors": 16,
"allocated_processors": 16,
"names": [
{
"name": "Linux",
"count": 2
}
],
"mem": {
"total": "119.9gb",
"total_in_bytes": 128781852672,
"free": "773.6mb",
"free_in_bytes": 811261952,
"used": "119.1gb",
"used_in_bytes": 127970590720,
"free_percent": 1,
"used_percent": 99
}
},
"process": {
"cpu": {
"percent": 5
},
"open_file_descriptors": {
"min": 740,
"max": 758,
"avg": 749
}
},
"jvm": {
"max_uptime": "15.6h",
"max_uptime_in_millis": 56253923,
"versions": [
{
"version": "1.8.0_144",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": "25.144-b01",
"vm_vendor": "Oracle Corporation",
"count": 2
}
],
"mem": {
"heap_used": "25.5gb",
"heap_used_in_bytes": 27453966464,
"heap_max": "63.8gb",
"heap_max_in_bytes": 68580016128
},
"threads": 230
},
"fs": {
"total": "3.4tb",
"total_in_bytes": 3740103417856,
"free": "2.7tb",
"free_in_bytes": 2969643327488,
"available": "2.5tb",
"available_in_bytes": 2779609776128
},
"plugins": [
{
"name": "ingest-geoip",
"version": "5.5.2",
"description": "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
"classname": "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
"has_native_controller": false
},
{
"name": "repository-s3",
"version": "5.5.2",
"description": "The S3 repository plugin adds S3 repositories",
"classname": "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
"has_native_controller": false
},
{
"name": "x-pack",
"version": "5.5.2",
"description": "Elasticsearch Expanded Pack Plugin",
"classname": "org.elasticsearch.xpack.XPackPlugin",
"has_native_controller": true
}
],
"network_types": {
"transport_types": {
"security4": 2
},
"http_types": {
"security4": 2
}
}
}
}


(Christian Dahlqvist) #9

It doesn't look like you have a crazy amount of shards, which can often cause these kind of problems. How many shards are still UNASSIGNED?

Is there anything in the logs?

Do you have minimum_master_nodes set to 2?


(Serg) #11

curl -s http://localhost:9200/_cat/shards | grep UNASS | wc -l
169

GET /_nodes/_master
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "prod",
"nodes": {
"dEcFJ6h_QfyKNr_N7362QQ": {
"name": "dEcFJ6h",


(Christian Dahlqvist) #12

@radi Please open a separate thread for your completely unrelated question.


(Christian Dahlqvist) #13

Can you please check this in your elasticsearch.yml config file?


(Serg) #14

No i dont have

root@04elasticsearch:/home/ubuntu# cat /etc/elasticsearch/elasticsearch.yml | grep -i master


(Serg) #15

cluster.name: prod
path.data: /vol
path.logs: /vol/logs
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10......","10......"]
xpack:
security:
authc:
realms:
native1:
type: native
order: 0

All my conf


(Christian Dahlqvist) #16

If you have 2 (or 3) master eligible nodes you need to set minimum_master_nodes to 2 in order to avoid split-brain scenarios. This means that the cluster will go red as soon as one node is missing, which is the correct behaviour in order to prevent data loss. If you need the cluster to be able to operate with one node down or unavailable, you need a minimum of 3 master eligible modes (which allows a majority of modes to elect a master even with one mode missing).

You might therefore be having a split-brain scenario, preventing the shards to be found and allocated.

Fix this and see if that allows the shards to get allocated.


(Serg) #17

PUT _cluster/settings
{
"transient": {
"discovery.zen.minimum_master_nodes": 2
}
}

{
"acknowledged": true,
"persistent": {},
"transient": {
"discovery": {
"zen": {
"minimum_master_nodes": "2"
}
}
}
}

But got the same result.

I found that some index have size parameter, but someone no.

logstash-2017.12.08 2 p UNASSIGNED

Index which had size parameter i've tried to do reindex and looks like that helped.
But all others dont want to work at that way

ex.

POST _reindex
{
"source": {
"index": "logstash-2017.12.08"
},
"dest": {
"index": "logstash-2017.12.33"
}
}

{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": []
},
"status": 503
}


(Christian Dahlqvist) #18

What is the output of the cat nodes API?


(Serg) #19

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.3.2.39 9 98 6 0.07 0.30 0.81 mdi - dEcFJ6h
10.3.2.45 9 95 6 0.07 0.34 0.65 mdi * 7vuMfym


(Christian Dahlqvist) #20

Not sure I understöd what you mean...

Now that both nodes are part of the same cluster, are shards getting allocated in the background? Do you see anything in the logs?


(Serg) #21

normal shards have a column with size

logstash-2017.11.14 0 p STARTED 3516457 3.2gb 10.3.2.45 7vuMfym

but UNASSIGNED dont have

logstash-2017.12.17 1 p UNASSIGNED