Unallocated shards when a node is removed

vangap · May 18, 2016, 11:41am

ES 2.1
AWS EC2, Private subnets.

When ever I remove a node, shards that belong to this node don't get re balanced

I think some times resetting the number of replicas solves the issue.

Most of the times I end up restarting nodes.

This is happening consistently every time.
On the master node I see this

delaying recovery of [7bs2em65onshoizh][1] as it is not listed as assigned to target node {Impala}{HXQsJaPeTP2DTVY7WzA_4Q}

any ideas why this might be happening?

warkolm · May 18, 2016, 12:10pm

Do you have allocation disabled?
Check _cluster/settings

vangap · May 18, 2016, 12:35pm

allocation is not disabled

_cluster/settings shows
{"persistent":{},"transient":{}}

_cat/shards shows UNASSIGNED shards untill I restart one of the remaining nodes

vangap · May 18, 2016, 1:29pm

More info

delayed_timeout is default 1m

there is no diskspace watermark limit on any of the remaining nodes.

_cluster/health

{"cluster_name":"abc","status":"yellow","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":222,"active_shards":296,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":148,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":66.66666666666666}

vangap · May 19, 2016, 8:54am

Moving the discussion from Add a notice note to README · Issue #1 · alicegoldfuss/shardnado · GitHub

@s1monw
I am not sure what this is

I think you should check the settings on your indices and cluster, use the explain feature on the _reroute API and first find out why before you fix the why.

btw, I didn't use this shardnado tool, I was just saying that some times ES doesn't allocate shards on its own.

our cluster is a basic one, doesn't have any routing values setup or anything.
As posted in earlier comments in this thread, cluster settings are empty. I haven't disabled allocation.

I checked on of the index settings also /index-name/_settings, nothing useful there about shards.

I can reproduce this, happens every time I remove a node.

please let me know how I can use this _reroute to debug this further.
Thanks.

s1monw · May 19, 2016, 9:10am

hey the _reroute API allows you to manually reroute shards to a node. you can use a parameter called explain=true that would give you the reasons why this allocation could or could not be applied. That should tell you why shards are not allocated and / or if they are throttled. If you call that API with an empty body you can trigger a new round of rerouting and get some information about all the shards. That should give you a much better idea of what is going on. If you can paste that output here I can take a look. Also send me the index settings of the index that is not allocating.

simon

vangap · May 19, 2016, 9:18am

Ok, so I have run this command

curl -XPOST 'localhost:9200/_cluster/reroute?pretty&explain' -d '{
    "commands" : [ 
        {
          "allocate" : {
              "index" : "1qp6axstrjub7ouw", "shard" : 1, "node" : "Nimrod"
          }
        }
    ]
}'

this allocated the replica 1 to the node Nimrod which was unassigned before. Remaining all unassigned shards also got allocated after this.

here is the output of that command in dry_run mode if that is of any use
https://gist.github.com/vanga/146c356c20758765afd22c4c739df572

s1monw · May 19, 2016, 9:41am

Can I see the index settings of that index 1qp6axstrjub7ouw

s1monw · May 19, 2016, 9:47am

here is the output of that command in dry_run mode if that is of any use

I think this is a bug in delayed allocation that misses to kick off another round of shard allocation. is there a chance for you to upgrade to 2.3 at some point?
Also you can simulate that missing round of allocaiotn by calling _reroute with an empty body.

vangap · May 19, 2016, 9:51am

here is the output of cluster settings

[root@ip-172-31-48-137 ~]# curl localhost:9200/1qp6axstrjub7ouw/_settings?pretty
{
  "1qp6axstrjub7ouw" : {
    "settings" : {
      "index" : {
        "creation_date" : "1446015175417",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "gOZtw-IGS2ap2Cz6tJt4MA",
        "version" : {
          "created" : "2000051"
        }
      }
    }
  }
}

We will eventually move to 2.3 (may be in few weeks or 1-2 months not sure), I saw there are some breaking changes from 2.1-2.2, need to look into them and plan.

s1monw · May 19, 2016, 9:52am

ok so can you provoke this problem again and see if an empty reroute fixes it?

vangap · May 19, 2016, 10:01am

Ok
curl -XPOST 'localhost:9200/_cluster/reroute' this does allocate those unassigned ones

s1monw · May 19, 2016, 12:58pm

perfect, I think you ran into one of those bugs where delayed allocation missed a reroute. Can you please upgrade to the latest and see if the bug persists? If so please open an issue on our issue tracker! thanks!

Topic		Replies	Views
UNASSIGNED shards after removing a node Elasticsearch	6	1838	July 5, 2017
Unassigned replica shards, and an unused node Elasticsearch	10	2002	July 6, 2017
Shards unassigned after node restarts - reason: NODE_LEFT Elasticsearch	16	37578	December 28, 2016
ES Unassigned shards not assigning Elasticsearch	9	1961	July 5, 2017
How to identify unallocated shards Elasticsearch	4	941	July 6, 2017

Unallocated shards when a node is removed

Related topics