Elastic search cluster shards are not recover version is 1.7


(Suresh Itha) #1

Hi Team,
Unfortunately we have deleted one of the data folder in data node which is part of elastic search cluster after delete the data node our elastic search cluster is not recovered the shards and always status is red because its showing as 184 unassigned shards show only unhealthy indices . Please find the cluster settings for reference.

{
"cluster_name": "XYZ1234",
"status": "red",
"timed_out": false,
"number_of_nodes": 50,
"number_of_data_nodes": 36,
"active_primary_shards": 2048,
"active_shards": 2048,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 184,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0
}

Please help me this issue its really helpful for me. This cluster is production cluster.


(David Pilato) #2

Best thing to do is to restore your data folder if you have a copy.
Otherwise you probably need to find which are the incomplete indices and DELETE them...


(Suresh Itha) #3

Thank you so much for response. Unfortunately I don't have the backup the data folder for this node but I can't delete the indices because in my cluster total 31active indices and we have huge data close to 7.5 TB. So I can't delete the index.

Note: In my cluster all are showing as incomplete the indices due to this issue.


(David Pilato) #4

If all 31 indices have primary shards missing and if you did not set replicas to 1, I'm afraid there is nothing to do.

You will probably have to reindex all the data from the source. That might be the right time to switch to 6.4.2 by the way.

I don't know your use case but while we are at it, it seems that you have only 3-4gb per shard. May be you should consider having 10x less shards like 200 shards for your cluster?


(Suresh Itha) #5

I don't have replication factor. My replication factor is "number_of_replicas": "0". And My use case is we can't covert to latest version because its legacy application and we missed all are primary shards in my cluster. Please help me how to recover this cluster to back.
Note: I am okay to lose the data which are unassigned shards.

Thanks


(David Pilato) #6

So, let me sum up:

  • You have 31 indices with 2232 shards in total. Which means that each index has 72 shards.
  • You have no replica
  • You have 36 data nodes. Which means to me that there is a lot of chance that every index has at least one primary shard of each index on every node.
  • You deleted your data dir from one of the nodes

Because you don't have any replica, the primaries which were on this node are now gone.
Because you have so many shards per index, your 31 indices are now in RED state.

You can't do anything I'm afraid to recover from that situation. But I guess this is not really important as you defined "number_of_replicas": "0" which means basically that you don't really care of loosing data.

That means that you need to rebuild your indices (your cluster?) from scratch.
And I believe that you don't have any backup either...


(Suresh Itha) #7

Yes, what every you have mentioned is current statements. I wrote one shell script to force recover the shards. After executed my script out of 184 unassigned shards its showing pending 120 shards. Please find the script..
#!/bin/bash
OLDIFS=$IFS
IFS=$'\n'

curl http://localhost:9202/_cat/shards | grep UNASSIGNED | sort > unassigned
for shard in cat unassigned;
do
index=echo $shard | awk '{print $1}'
lostshard=echo $shard | awk '{print $2}'
#echo "$index and $lostshard"

curl -XPOST 'localhost:9201/_cluster/reroute' -d "{ "commands" : [ { "allocate" : { "index" : "$index", "shard" : $lostshard, "node": "master-node-name", "allow_primary": "true" } }] }"

done

IFS=$OLDIFS

Now I am getting the below error is

{"error":"ElasticsearchIllegalArgumentException[[allocate] allocation of [index_name][8] on node [node_name][thhPJF__Q4-aTzt6un4E9A][data_node][inet[/data_node:9302]]{tag=source01-data1-data-node, master=false} is not allowed, reason: [YES(shard is not allocated to same node or host)][NO(node does not match index include filters [tag:"source02*"])][YES(shard is primary)][YES(below primary recovery limit of [4])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] <= 0)][YES(primary shard can be allocated anywhere)][YES(enough disk for shard on node, free: [1.2tb])][YES(no snapshots are currently running)]]","status":400}. Please find the status.

{
"cluster_name": "XYZ cluster name",
"status": "red",
"timed_out": false,
"number_of_nodes": 50,
"number_of_data_nodes": 36,
"active_primary_shards": 2112,
"active_shards": 2112,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0
}


(Suresh Itha) #8

Issue got fixed. I have changed the tagging data2 for other pending shards now cluster is green but i lost the data. I have enabled my storm pipeline to recover my old data. Thanks for supporting.

{
"cluster_name": "XYZ cluser name",
"status": "green",
"timed_out": false,
"number_of_nodes": 50,
"number_of_data_nodes": 36,
"active_primary_shards": 2232,
"active_shards": 2232,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0
}


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.