Cluster Recovery took long time

I have a cluster with 2 nodes on AWS , then I release one of them and keep another so my cluster now had 1 master node , with active_primary_shards 38180
the problem is when I test failover time recovery it took 12 hours to recover and cluster status turned into "yellow" jp as shown in attached image

You have far too many shards for a cluster that size. Read this blog post for some guidance on shards and sharding practices.

thanks for reply @Christian_Dahlqvist , so now I know that I had too many shards , is there any solution to avoid this deadly time when recover or even a solution to re-shard the cluster ??

With that number of shards I am surprised your cluster is running at all. I think it may be the largest shard count I have ever seen for a single node. As you only have 1 node your replicas will never allocate, so it seems fine, at least from that perspective.

so , is there anyway to avoid this single point of failure , to avoid cluster failure and took 12 hours to recover , such as adding new nodes ?

Adding nodes may help and provide some temporary relief, but I am quite sure you will still need to change your sharding practices.

ok thanks for your kind help , so If I decide to sharding again what is the best options ?

As outlined in the blog I linked to, try to make sure you have quite large shards as having lots of small indices and shards is very inefficient.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.