Cluster Recovery took long time

mahmoud_moharam · August 8, 2018, 10:24am

I have a cluster with 2 nodes on AWS , then I release one of them and keep another so my cluster now had 1 master node , with active_primary_shards 38180
the problem is when I test failover time recovery it took 12 hours to recover and cluster status turned into "yellow" as shown in attached image

Christian_Dahlqvist · August 8, 2018, 11:05am

You have far too many shards for a cluster that size. Read this blog post for some guidance on shards and sharding practices.

mahmoud_moharam · August 8, 2018, 11:27am

thanks for reply @Christian_Dahlqvist , so now I know that I had too many shards , is there any solution to avoid this deadly time when recover or even a solution to re-shard the cluster ??

Christian_Dahlqvist · August 8, 2018, 11:31am

With that number of shards I am surprised your cluster is running at all. I think it may be the largest shard count I have ever seen for a single node. As you only have 1 node your replicas will never allocate, so it seems fine, at least from that perspective.

mahmoud_moharam · August 8, 2018, 11:47am

so , is there anyway to avoid this single point of failure , to avoid cluster failure and took 12 hours to recover , such as adding new nodes ?

Christian_Dahlqvist · August 8, 2018, 12:53pm

Adding nodes may help and provide some temporary relief, but I am quite sure you will still need to change your sharding practices.

mahmoud_moharam · August 8, 2018, 12:56pm

ok thanks for your kind help , so If I decide to sharding again what is the best options ?

Christian_Dahlqvist · August 8, 2018, 12:57pm

As outlined in the blog I linked to, try to make sure you have quite large shards as having lots of small indices and shards is very inefficient.

system · September 5, 2018, 12:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.