Shards unassigned after elasticsearch restart

(EatDataForBreakfast) #1

My elasticsearch instances were restarted, but it looks like after that I have about 10 unassisned shards :frowning:
The reason seems to be node_left, how can I re-assign the shards to the respective nodes?

"cluster_name" : "santorini_rec",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 3,
"active_primary_shards" : 16,
"active_shards" : 18,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 2,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 56.25

Does this mean I lost all the data?

.monitoring-es-6-2017.08.05 0 r UNASSIGNED NODE_LEFT
santo_pipeline_data 5 r UNASSIGNED NODE_LEFT
santo_pipeline_data 3 r UNASSIGNED NODE_LEFT
santo_pipeline_data 8 r UNASSIGNED NODE_LEFT
santo_pipeline_data 2 r UNASSIGNED NODE_LEFT
santo_pipeline_data 1 r UNASSIGNED PRIMARY_FAILED
.monitoring-alerts-6 0 r UNASSIGNED NODE_LEFT
.watcher-history-3-2017.08.05 0 r UNASSIGNED NODE_LEFT
.monitoring-kibana-6-2017.08.05 0 r UNASSIGNED NODE_LEFT

I tried to enable re-allocation, but it dint help.

"transient": {
"cluster.routing.allocation.enable": "all"

Elasticsearch seems to be so sensitive about any operation. Everytime I touch it, something or the other goes wrong and its just so hard to recover it back.

Can someone please help?

(Mark Walkom) #2

What version are you on?

(EatDataForBreakfast) #3

@warkolm I am on 5.5.1

I just tried turning off all replicas and turning back on again, not sure if it will help.
Its still initializing the shards.

(Mark Walkom) #4

Why were the instances restarted?

(EatDataForBreakfast) #5

I disabled swapping on all hosts, and I had to tune some networking related parameters. So I gracefully shut down elastic using systemd , except for the master node.

(Mark Walkom) #6

Did you shut down all the data nodes at once?

(EatDataForBreakfast) #7

No, one by one, making sure each instance came up.

(EatDataForBreakfast) #8

Not sure if it was a good idea, i turned replica to 0 and back to 10 again...

"cluster_name" : "santorini_rec",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 3,
"active_primary_shards" : 16,
"active_shards" : 23,
"relocating_shards" : 0,
"initializing_shards" : 6,
"unassigned_shards" : 84,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 20.353982300884958

(EatDataForBreakfast) #9

I still see indexing work and the logs are being ingested.
Should I shut down logstash until I fix the state, or it should be ok?

(EatDataForBreakfast) #10

All the unassigned are replicas. Unable to recover them. Appreciate any suggestions.

(Mark Walkom) #11

Why do you need 10 replicas?

(EatDataForBreakfast) #12

Actually I dont, I created 9 shards for 3 nodes, and it automatically created 10 replicas total.
Some of them recovered automatically

(Mark Walkom) #13

You definitely have more than one replica set, and if you have more than 2 then those extras won't even be assigned.

(EatDataForBreakfast) #14

Can I reduce the numnber of replicas now? What solution would you suggest

(Mark Walkom) #15

Usually having more than 1 replica set is not required unless you have unstable infrastructure or low volumes of data queried at a high rate.

You have 3 data nodes, that means you can only store the primary and 2 replica sets of the data. Setting more will mean unassigned (replica) shards, which causes the yellow status until you either have more nodes to hold them or you reduce the replica count to fit in your cluster.

curl -XPUT localhost:9200/*/_settings -d '{ "index" : { "number_of_replicas" : N } }', where N is the number you want.

(EatDataForBreakfast) #16

Thankyou @warkolm. That makes sense. I have adjusted my replicas accordingly and got it to green now :slight_smile:

(system) #17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.