Elasticsearch not reassigning shards after node rejoined

Large number of indices and shards lead to a larger cluster state and more entities to gather stats for/from. David is probably in a better position to quantify potential impact.

I could believe it, yes, some stats are collected per-shard so with thousands of shards it could be slow to obtain those stats. I'd missed the shard count in this cluster but @Christian_Dahlqvist is right it's pretty high and likely to cause issues like this.

Not really. If you're going to run out of management threads then adding more threads will just delay the inevitable. I think it'd be better to increase the timeouts for that stats collection to a level where it reliably completes on your cluster, because this'll limit how many of these requests might be happening in parallel. You only get parallel requests when one times out. Try these:

xpack.monitoring.collection.node.stats.timeout: 60s
xpack.monitoring.collection.index.stats.timeout: 60s

Right. I will look into that and will get back to you in a couple of days. Thanks!

All right, seems that the disk / performance problems were related not to shard count, but to cloud-provider having messed up the VM disks' filesystems.
Changes were introduced and so far the cluster works fine. I believe the problem to be resolved. Many thanks for your help, guys!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.