Has anybody experienced this? Or is this normal?
After adding 6 new data nodes, the high CPU (bouncing off 100%) often persisted for several days (around 5 days).
The shards are balanced within a day of new node addition, so the high CPU is not shards movement. I confirmed this with listing of tasks.
I've noticed the disk write on those new data nodes are very high. So it points to unbalanced primary vs replica shard allocation on the new nodes.
Is this normal though? Why would the recovery algorithm be so unbalanced?
Just for context, there are 48 old data nodes and 6 new data nodes.