Shard replication/recovery going slow

Ant · July 20, 2018, 7:38am

I recently had a node fail completely and had to rebuild it, there are only 3 nodes in the cluster and it took 5 days for the cluster to get back to a status of green. There are some indexes which are a few MB and others that are 300GB and it looks like it was rate limiting on how many indexes or shards it would do in an hour so if it hit a patch of the smaller indexes it would pretty much be sitting idle as it would send all it was happy to then just wait. In contrast when it hit the bigger indexes you would just see a flurry of activity and looking as the disk use chart it would suddenly start shooting up.

I'm guessing there are some settings to help control this and I would like to configure them so that this blend of indexes isn't such an issue for me as I also recently had to restart a node (which I may have to rebuild) and it took a day to mark all the shards as active as each node hosts some 12k shards. If anyone knows what settings I need to look at to resolve this that would be great. I know there is one to limit the speed of transfer but as when it hit the larger indexes it did use the NIC I don't feel that's it. I guess I'm looking to increase frequency of checking if it's ready to send something else and maybe concurrency.

Thanks in advance
Ant

Christian_Dahlqvist · July 20, 2018, 8:06am

It sounds like you simply have far too many shards given the size of your cluster. Have a look at this blog post about shards and sharding for guidance. A large number of indices and shards will lead to a large cluster state that can get slow to update for every change to shard allocation.

system · August 17, 2018, 8:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard re-allocation taking a very long time Elasticsearch	16	7531	April 15, 2019
Restarting node takes time Elasticsearch	4	1079	July 5, 2017
Restarting of node taking much time Elasticsearch	6	2430	July 6, 2017
Shard rebalancing is slow after network failure on any node Elasticsearch	7	1379	February 19, 2019
Moving shards is slow - Solution Elasticsearch	3	1028	May 15, 2020

Shard replication/recovery going slow

Related topics