Reassigning Shards

We had to hard shutdown all of our Elasticsearch server due to an environmental issue. Now all of our shards are unassigned:

{
  "cluster_name" : "Elasticsearch-Cluster-1",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 101,
  "active_shards" : 101,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 109,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 48.095238095238095
}

logstash-2016.04.27     1 r UNASSIGNED
logstash-2016.04.27     0 r UNASSIGNED
logstash-2016.04.28     2 p UNASSIGNED
logstash-2016.04.28     2 r UNASSIGNED
logstash-2016.04.28     1 p UNASSIGNED
logstash-2016.04.28     1 r UNASSIGNED
logstash-2016.04.28     0 p UNASSIGNED
logstash-2016.04.28     0 r UNASSIGNED
.marvel-es-2016.04.28   0 p UNASSIGNED
.marvel-es-2016.04.28   0 r UNASSIGNED
.marvel-es-data-1       0 r UNASSIGNED
.marvel-es-2016.04.25   0 r UNASSIGNED
.marvel-es-2016.04.24   0 r UNASSIGNED
.marvel-es-2016.04.27   0 r UNASSIGNED
.marvel-es-2016.04.26   0 r UNASSIGNED
logstash-2016.03.29     2 r UNASSIGNED
logstash-2016.03.29     1 r UNASSIGNED
logstash-2016.03.29     0 r UNASSIGNED
logstash-2016.03.28     2 r UNASSIGNED
logstash-2016.03.28     1 r UNASSIGNED
logstash-2016.03.28     0 r UNASSIGNED
.marvel-es-1-2016.04.28 0 r UNASSIGNED
.marvel-es-2016.03.30   0 r UNASSIGNED
.marvel-es-2016.03.31   0 r UNASSIGNED
.marvel-es-2016.04.01   0 r UNASSIGNED
.marvel-es-2016.03.28   0 r UNASSIGNED
.marvel-es-2016.03.29   0 r UNASSIGNED
logstash-2016.04.01     2 r UNASSIGNED
logstash-2016.04.01     1 r UNASSIGNED
logstash-2016.04.01     0 r UNASSIGNED
logstash-2016.04.02     2 r UNASSIGNED
logstash-2016.04.02     1 r UNASSIGNED
logstash-2016.04.02     0 r UNASSIGNED
logstash-2016.03.31     2 r UNASSIGNED
logstash-2016.03.31     1 r UNASSIGNED
logstash-2016.03.31     0 r UNASSIGNED
logstash-2016.04.03     2 r UNASSIGNED
logstash-2016.04.03     1 r UNASSIGNED
logstash-2016.04.03     0 r UNASSIGNED
logstash-2016.03.30     2 r UNASSIGNED
logstash-2016.03.30     1 r UNASSIGNED
logstash-2016.03.30     0 r UNASSIGNED
logstash-2016.04.04     2 r UNASSIGNED
logstash-2016.04.04     1 r UNASSIGNED
logstash-2016.04.04     0 r UNASSIGNED
logstash-2016.04.09     2 r UNASSIGNED
logstash-2016.04.09     1 r UNASSIGNED
logstash-2016.04.09     0 r UNASSIGNED
logstash-2016.04.05     2 r UNASSIGNED
logstash-2016.04.05     1 r UNASSIGNED
logstash-2016.04.05     0 r UNASSIGNED
logstash-2016.04.06     2 r UNASSIGNED
logstash-2016.04.06     1 r UNASSIGNED
logstash-2016.04.06     0 r UNASSIGNED
logstash-2016.04.07     2 r UNASSIGNED
logstash-2016.04.07     1 r UNASSIGNED
logstash-2016.04.07     0 r UNASSIGNED
logstash-2016.04.08     2 r UNASSIGNED
logstash-2016.04.08     1 r UNASSIGNED
logstash-2016.04.08     0 r UNASSIGNED
.marvel-es-2016.04.10   0 r UNASSIGNED
.marvel-es-2016.04.12   0 r UNASSIGNED
.marvel-es-2016.04.11   0 r UNASSIGNED
.marvel-es-2016.04.07   0 r UNASSIGNED
.marvel-es-2016.04.06   0 r UNASSIGNED
.marvel-es-2016.04.09   0 r UNASSIGNED
.marvel-es-2016.04.08   0 r UNASSIGNED
.kibana                 0 r UNASSIGNED
.marvel-es-2016.04.03   0 r UNASSIGNED
.marvel-es-2016.04.02   0 r UNASSIGNED
.marvel-es-2016.04.05   0 r UNASSIGNED
.marvel-es-2016.04.04   0 r UNASSIGNED
logstash-2016.04.12     2 r UNASSIGNED
logstash-2016.04.12     1 r UNASSIGNED
logstash-2016.04.12     0 r UNASSIGNED
logstash-2016.04.13     2 r UNASSIGNED
logstash-2016.04.13     1 r UNASSIGNED
logstash-2016.04.13     0 r UNASSIGNED
logstash-2016.04.14     2 r UNASSIGNED
logstash-2016.04.14     1 r UNASSIGNED
logstash-2016.04.14     0 r UNASSIGNED

Anyone have any suggestions on how to best bring them back into the cluster?

I found the following command that is supposed to go thru each unassigned shard and assign it.

for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do
    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "t37", 
                  "shard" : $shard, 
                  "node" : "datanode15", 
                  "allow_primary" : true
              }
            }
        ]
    }'
    sleep 5
done

But when I run the command I get the following error:

{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Unrecognized token '$shard': was expecting ('true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@4351a6c2; line: 5, column: 36]"}],"type":"json_parse_exception","reason":"Unrecognized token '$shard': was expecting ('true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@4351a6c2; line: 5, column: 36]"},"status":500}

Hi! Look at data and active master logs are any error or warnings about shards?
Another point, look at fs level on nodes if there shard's files or shards directory just empty?

There error in script as $shard does not recognized as variable. Just change $shard to '$shard':

Try something like this (you need to parse NODE, INDEX, SHARD not only SHARD):

#!/bin/bash
for line in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{printf "%s;%s;%s\n", $1, $2, $8}' );  do
    IFS=';' read index shard node <<< "$line"
    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "'$index'", 
                  "shard" : '$shard', 
                  "node" : "'$node'", 
                  "allow_primary" : true
              }
            }
        ]
    }'
    sleep 5
done

The struggle continues:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[p-es-1][192.168.56.86:9300][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate] allocation of [logstash-2016.04.20][2] on node {p-es-1}{uUw7qMvGSCqyILY1IbT9NQ}{192.168.56.86}{192.168.56.86:9300} is not allowed, reason: [YES(target node version [2.3.2] is same or newer than source node version [2.3.2])][NO(shard cannot be allocated on same node [uUw7qMvGSCqyILY1IbT9NQ] it already exists on)][YES(shard not primary or relocation disabled)][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][NO(more than allowed [85.0%] used disk on node, free: [14.640832792774445%])][YES(below shard recovery limit of [2])][YES(primary is already active)][YES(no allocation awareness enabled)][YES(node passes include/exclude/require filters)][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)]"},"status":400}

The line that sticks out is: "shard cannot be allocated on same node [uUw7qMvGSCqyILY1IbT9NQ] it already exists on". I tried putting it on all of our hosts and ended up with the same error.

Also pay attention to this message:

[NO(more than allowed [85.0%] used disk on node, free: [14.640832792774445%])]

Maybe you hit by free space limit?

I dont believe so since the index says that I am trying to assign is only 800mb and the smallest free size is around 15gb

What is the output of df -h ? As you can see there are percentage in message not absolute numbers, so no matter how much you have in compare to shard size, it's just less then 15% of free space.

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/disk-allocator.html

Ok let me expand the drives and give it another shot.

In fact, if you are know data rate, you can completely disable: cluster.routing.allocation.disk.threshold_enabled
or change cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high to higher percent values or to an absolute byte value (like 900mb).

I got it all resolved! Thanks for your help.

Hi Jnpetty, I seem to be experience the same issue as you did. What finally ended up working for you?

Hi Ned,

I actually didnt make any changes to the cluster configuration. I didn't realize that when the disk space gets below 15% shard allocation stops completely. In our case we still had several 100Gb free, but it was below 15%. So now I just monitor disk space a little closer and make sure indexes are being deleted. Once I rebuild the cluster in production I may set an absolute value based on our disk space.

Gotcha, thanks for your quick response! I wonder if there is any danger in turning off the disk allocation decider.

In theory as long as your data is on a different file-system, it probably wont make a difference. I think the setting is only in place to protect you from filling up you root file-system and halting your system.