Hi Team,
It's urgent, though i doubt about recovery but if you can help it will be great.
My ES cluster datanodes are under autoscaling group and while updating the stack we did a mistake. The result is, we had 8 data nodes out of which 4 got terminated. New nodes came up but now ES is RED because shards (450) are unassigned.
To start ELK i closed all indexes for which shards were unassigned but need to reproduce the data. Snapshot i tested but not yet implemented, so backup looks like not an option.
Here is some conf details:
ES Cluster: 1-master, 1-client,8-data node
Conf: 4-shards and 2-replicas
Size: 150GB (before incident)
If you have 3 copies of each shard (1 primary and 2 replicas) and have not used shard allocation awareness to ensure that at least one replica for each shard is present on the data nodes that remain, it is possible that some shards may have had all copies allocated to the nodes that were terminated. If this is the case and you do not have a snapshot to restore from, you will either need to bring the nodes you terminated back up again (if that is even possible) or reindex your data from an external source.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.