How to get back my Primary shards?

klahnakoski · January 23, 2016, 6:47pm

After disabling allocation

curl -X PUT -d "{\"persistent\": {\"cluster.routing.allocation.enable\": \"none\"}}"  http://localhost:9200/_cluster/settings

I shutdown one of my nodes for service. Now that it is back up, the PRIMARY shards remained unassigned. The data is still on the drive. What is the command to tell ES that, "yes, you got a whole shard, please use it"?

I tried a /_cluster/reroute command on a single shard of an unimportant index. "allow_primary": false reports an error. And "allow_primary": true erases everything on that shard.

Thanks

magnusbaeck · January 23, 2016, 6:50pm

You did reenable allocations after the node came back up, right?

klahnakoski · January 23, 2016, 7:07pm

My ES version is 1.7.1.

I will reenable allocations, and get back to you

klahnakoski · January 23, 2016, 7:15pm

By enabling allocation, the serviced node will start replicating nodes from the others, but the primary shards remain unassigned.

klahnakoski · January 23, 2016, 7:36pm

Plus, once allocation is enabled I get

[2016-01-23 19:08:18,962][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [NONE] to [ALL]
[2016-01-23 19:08:18,970][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,971][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,973][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,973][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,974][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot

klahnakoski · January 23, 2016, 8:06pm

I disabled shard allocation, shutdown the problem node, and tried removing the translogs

sudo find /data1 -name translog-* -exec rm -rf {} \;

I started the node, enabled allocation, and the primary shards still do not come back


[2016-01-23 19:59:52,626][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [ALL] to [NONE]
[2016-01-23 19:59:52,659][INFO ][http                     ] [primary] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.31.0.196:9200]}
[2016-01-23 19:59:52,662][INFO ][node                     ] [primary] started
[2016-01-23 20:01:51,095][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [NONE] to [ALL]
[2016-01-23 20:01:51,108][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,113][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,110][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,131][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,109][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot

klahnakoski · January 23, 2016, 8:44pm

In an attempt to increase the available space on my node I change my *yml file from

path.data: /data1, /data2, /data3

to

path.data: /data1, /data2, /data3, /data4

undoing this change, and restarting the node, got back most of my PRIMARY shards, except one.

klahnakoski · January 25, 2016, 12:55am

In summary, before version 2.0, you can not change the path.data or you will loose data. It appears the older versions of ES were striping data over the paths, so changing the number confused it. Lucky that the data was simply considered unusable, and not removed from the drive: Reverting the path.data let the shards recover.

bleskes · January 25, 2016, 1:34pm

For the recorde - adding paths to the list shouldn't cause data to go away. Something else is going on here. Also, I know in the past that people recommended (including me :)) to delete the translog on recovery problem but this is only when you run into translog corruption on old ES version, where we didn't shut down correctly and corruption was expected. These days are long gone and newer version will refuse to open a primary if the translog is missing.

I presume there is no way to dig deeper now and the data is long gone. If not please ping me and we can research.

Topic		Replies	Views
Unassigned primary and replica shards Elasticsearch	6	2058	July 6, 2017
Allocate Shards from status unassigned 5.6.2 Elasticsearch	4	2216	March 3, 2018
Way to route primary shards back to other nodes in case of data node failure in cluster Elasticsearch	3	546	July 6, 2017
Why shard unassigned after cluster restart completely? Elasticsearch	1	384	May 28, 2020
Shards unassigned after some nodes went down Elasticsearch	8	420	September 29, 2020

How to get back my Primary shards?

Related topics