How to get back my Primary shards?


(Kyle Lahnakoski) #1

After disabling allocation

curl -X PUT -d "{\"persistent\": {\"cluster.routing.allocation.enable\": \"none\"}}"  http://localhost:9200/_cluster/settings

I shutdown one of my nodes for service. Now that it is back up, the PRIMARY shards remained unassigned. The data is still on the drive. What is the command to tell ES that, "yes, you got a whole shard, please use it"?

I tried a /_cluster/reroute command on a single shard of an unimportant index. "allow_primary": false reports an error. And "allow_primary": true erases everything on that shard.

Thanks


How to increase `path.data` for more space?
(Magnus B├Ąck) #2

You did reenable allocations after the node came back up, right?


(Kyle Lahnakoski) #3

My ES version is 1.7.1.

I will reenable allocations, and get back to you


(Kyle Lahnakoski) #4

By enabling allocation, the serviced node will start replicating nodes from the others, but the primary shards remain unassigned.


(Kyle Lahnakoski) #5

Plus, once allocation is enabled I get

[2016-01-23 19:08:18,962][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [NONE] to [ALL]
[2016-01-23 19:08:18,970][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,971][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,973][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,973][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 19:08:18,974][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot

(Kyle Lahnakoski) #6

I disabled shard allocation, shutdown the problem node, and tried removing the translogs

sudo find /data1 -name translog-* -exec rm -rf {} \;

I started the node, enabled allocation, and the primary shards still do not come back


[2016-01-23 19:59:52,626][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [ALL] to [NONE]
[2016-01-23 19:59:52,659][INFO ][http                     ] [primary] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.31.0.196:9200]}
[2016-01-23 19:59:52,662][INFO ][node                     ] [primary] started
[2016-01-23 20:01:51,095][INFO ][cluster.routing.allocation.decider] [primary] updating [cluster.routing.allocation.enable] from [NONE] to [ALL]
[2016-01-23 20:01:51,108][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,113][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,110][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,131][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot
[2016-01-23 20:01:51,109][INFO ][indices.store            ] [primary] Failed to open / find files while reading metadata snapshot


(Kyle Lahnakoski) #7

In an attempt to increase the available space on my node I change my *yml file from

path.data: /data1, /data2, /data3

to

path.data: /data1, /data2, /data3, /data4

undoing this change, and restarting the node, got back most of my PRIMARY shards, except one.


(Kyle Lahnakoski) #8

In summary, before version 2.0, you can not change the path.data or you will loose data. It appears the older versions of ES were striping data over the paths, so changing the number confused it. Lucky that the data was simply considered unusable, and not removed from the drive: Reverting the path.data let the shards recover.


(Boaz Leskes) #9

For the recorde - adding paths to the list shouldn't cause data to go away. Something else is going on here. Also, I know in the past that people recommended (including me :)) to delete the translog on recovery problem but this is only when you run into translog corruption on old ES version, where we didn't shut down correctly and corruption was expected. These days are long gone and newer version will refuse to open a primary if the translog is missing.

I presume there is no way to dig deeper now and the data is long gone. If not please ping me and we can research.


(system) #10