Reassigning Shards

jnpetty · April 28, 2016, 9:07pm

We had to hard shutdown all of our Elasticsearch server due to an environmental issue. Now all of our shards are unassigned:

{
  "cluster_name" : "Elasticsearch-Cluster-1",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 101,
  "active_shards" : 101,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 109,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 48.095238095238095
}

logstash-2016.04.27     1 r UNASSIGNED
logstash-2016.04.27     0 r UNASSIGNED
logstash-2016.04.28     2 p UNASSIGNED
logstash-2016.04.28     2 r UNASSIGNED
logstash-2016.04.28     1 p UNASSIGNED
logstash-2016.04.28     1 r UNASSIGNED
logstash-2016.04.28     0 p UNASSIGNED
logstash-2016.04.28     0 r UNASSIGNED
.marvel-es-2016.04.28   0 p UNASSIGNED
.marvel-es-2016.04.28   0 r UNASSIGNED
.marvel-es-data-1       0 r UNASSIGNED
.marvel-es-2016.04.25   0 r UNASSIGNED
.marvel-es-2016.04.24   0 r UNASSIGNED
.marvel-es-2016.04.27   0 r UNASSIGNED
.marvel-es-2016.04.26   0 r UNASSIGNED
logstash-2016.03.29     2 r UNASSIGNED
logstash-2016.03.29     1 r UNASSIGNED
logstash-2016.03.29     0 r UNASSIGNED
logstash-2016.03.28     2 r UNASSIGNED
logstash-2016.03.28     1 r UNASSIGNED
logstash-2016.03.28     0 r UNASSIGNED
.marvel-es-1-2016.04.28 0 r UNASSIGNED
.marvel-es-2016.03.30   0 r UNASSIGNED
.marvel-es-2016.03.31   0 r UNASSIGNED
.marvel-es-2016.04.01   0 r UNASSIGNED
.marvel-es-2016.03.28   0 r UNASSIGNED
.marvel-es-2016.03.29   0 r UNASSIGNED
logstash-2016.04.01     2 r UNASSIGNED
logstash-2016.04.01     1 r UNASSIGNED
logstash-2016.04.01     0 r UNASSIGNED
logstash-2016.04.02     2 r UNASSIGNED
logstash-2016.04.02     1 r UNASSIGNED
logstash-2016.04.02     0 r UNASSIGNED
logstash-2016.03.31     2 r UNASSIGNED
logstash-2016.03.31     1 r UNASSIGNED
logstash-2016.03.31     0 r UNASSIGNED
logstash-2016.04.03     2 r UNASSIGNED
logstash-2016.04.03     1 r UNASSIGNED
logstash-2016.04.03     0 r UNASSIGNED
logstash-2016.03.30     2 r UNASSIGNED
logstash-2016.03.30     1 r UNASSIGNED
logstash-2016.03.30     0 r UNASSIGNED
logstash-2016.04.04     2 r UNASSIGNED
logstash-2016.04.04     1 r UNASSIGNED
logstash-2016.04.04     0 r UNASSIGNED
logstash-2016.04.09     2 r UNASSIGNED
logstash-2016.04.09     1 r UNASSIGNED
logstash-2016.04.09     0 r UNASSIGNED
logstash-2016.04.05     2 r UNASSIGNED
logstash-2016.04.05     1 r UNASSIGNED
logstash-2016.04.05     0 r UNASSIGNED
logstash-2016.04.06     2 r UNASSIGNED
logstash-2016.04.06     1 r UNASSIGNED
logstash-2016.04.06     0 r UNASSIGNED
logstash-2016.04.07     2 r UNASSIGNED
logstash-2016.04.07     1 r UNASSIGNED
logstash-2016.04.07     0 r UNASSIGNED
logstash-2016.04.08     2 r UNASSIGNED
logstash-2016.04.08     1 r UNASSIGNED
logstash-2016.04.08     0 r UNASSIGNED
.marvel-es-2016.04.10   0 r UNASSIGNED
.marvel-es-2016.04.12   0 r UNASSIGNED
.marvel-es-2016.04.11   0 r UNASSIGNED
.marvel-es-2016.04.07   0 r UNASSIGNED
.marvel-es-2016.04.06   0 r UNASSIGNED
.marvel-es-2016.04.09   0 r UNASSIGNED
.marvel-es-2016.04.08   0 r UNASSIGNED
.kibana                 0 r UNASSIGNED
.marvel-es-2016.04.03   0 r UNASSIGNED
.marvel-es-2016.04.02   0 r UNASSIGNED
.marvel-es-2016.04.05   0 r UNASSIGNED
.marvel-es-2016.04.04   0 r UNASSIGNED
logstash-2016.04.12     2 r UNASSIGNED
logstash-2016.04.12     1 r UNASSIGNED
logstash-2016.04.12     0 r UNASSIGNED
logstash-2016.04.13     2 r UNASSIGNED
logstash-2016.04.13     1 r UNASSIGNED
logstash-2016.04.13     0 r UNASSIGNED
logstash-2016.04.14     2 r UNASSIGNED
logstash-2016.04.14     1 r UNASSIGNED
logstash-2016.04.14     0 r UNASSIGNED

Anyone have any suggestions on how to best bring them back into the cluster?

jnpetty · April 28, 2016, 9:45pm

I found the following command that is supposed to go thru each unassigned shard and assign it.

for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do
    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "t37", 
                  "shard" : $shard, 
                  "node" : "datanode15", 
                  "allow_primary" : true
              }
            }
        ]
    }'
    sleep 5
done

But when I run the command I get the following error:

{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Unrecognized token '$shard': was expecting ('true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@4351a6c2; line: 5, column: 36]"}],"type":"json_parse_exception","reason":"Unrecognized token '$shard': was expecting ('true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@4351a6c2; line: 5, column: 36]"},"status":500}

rusty · April 29, 2016, 10:03am

Hi! Look at data and active master logs are any error or warnings about shards?
Another point, look at fs level on nodes if there shard's files or shards directory just empty?

rusty · April 29, 2016, 10:47am

There error in script as $shard does not recognized as variable. Just change $shard to '$shard':

Try something like this (you need to parse NODE, INDEX, SHARD not only SHARD):

#!/bin/bash
for line in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{printf "%s;%s;%s\n", $1, $2, $8}' );  do
    IFS=';' read index shard node <<< "$line"
    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "'$index'", 
                  "shard" : '$shard', 
                  "node" : "'$node'", 
                  "allow_primary" : true
              }
            }
        ]
    }'
    sleep 5
done

jnpetty · April 29, 2016, 3:09pm

The struggle continues:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[p-es-1][192.168.56.86:9300][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate] allocation of [logstash-2016.04.20][2] on node {p-es-1}{uUw7qMvGSCqyILY1IbT9NQ}{192.168.56.86}{192.168.56.86:9300} is not allowed, reason: [YES(target node version [2.3.2] is same or newer than source node version [2.3.2])][NO(shard cannot be allocated on same node [uUw7qMvGSCqyILY1IbT9NQ] it already exists on)][YES(shard not primary or relocation disabled)][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][NO(more than allowed [85.0%] used disk on node, free: [14.640832792774445%])][YES(below shard recovery limit of [2])][YES(primary is already active)][YES(no allocation awareness enabled)][YES(node passes include/exclude/require filters)][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)]"},"status":400}

The line that sticks out is: "shard cannot be allocated on same node [uUw7qMvGSCqyILY1IbT9NQ] it already exists on". I tried putting it on all of our hosts and ended up with the same error.

rusty · April 29, 2016, 3:31pm

Also pay attention to this message:

[NO(more than allowed [85.0%] used disk on node, free: [14.640832792774445%])]

Maybe you hit by free space limit?

jnpetty · April 29, 2016, 3:35pm

I dont believe so since the index says that I am trying to assign is only 800mb and the smallest free size is around 15gb

rusty · April 29, 2016, 3:41pm

What is the output of df -h ? As you can see there are percentage in message not absolute numbers, so no matter how much you have in compare to shard size, it's just less then 15% of free space.

rusty · April 29, 2016, 3:45pm

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/disk-allocator.html

jnpetty · April 29, 2016, 5:01pm

Ok let me expand the drives and give it another shot.

rusty · April 29, 2016, 5:37pm

In fact, if you are know data rate, you can completely disable: cluster.routing.allocation.disk.threshold_enabled
or change cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high to higher percent values or to an absolute byte value (like 900mb).

jnpetty · April 29, 2016, 8:27pm

I got it all resolved! Thanks for your help.

Ned · October 3, 2016, 4:12pm

Hi Jnpetty, I seem to be experience the same issue as you did. What finally ended up working for you?

jnpetty · October 3, 2016, 4:40pm

Hi Ned,

I actually didnt make any changes to the cluster configuration. I didn't realize that when the disk space gets below 15% shard allocation stops completely. In our case we still had several 100Gb free, but it was below 15%. So now I just monitor disk space a little closer and make sure indexes are being deleted. Once I rebuild the cluster in production I may set an absolute value based on our disk space.

Ned · October 4, 2016, 2:35am

Gotcha, thanks for your quick response! I wonder if there is any danger in turning off the disk allocation decider.

jnpetty · October 4, 2016, 3:28pm

In theory as long as your data is on a different file-system, it probably wont make a difference. I think the setting is only in place to protect you from filling up you root file-system and halting your system.

Topic		Replies	Views
Unassigned shards, v2 Elasticsearch	5	1343	July 6, 2017
Could not reassign UNASSIGNED shards (elasticsearch 5.6 ) Elasticsearch	25	14642	January 25, 2018
ES Unassigned shards not assigning Elasticsearch	9	1957	July 5, 2017
Unassigned shards Elasticsearch	3	465	July 6, 2017
Shards remain "unassigned " after server restart Elasticsearch	5	993	July 6, 2017

Reassigning Shards

Related topics