We have a 7.17.4 cluster with 10 data nodes. Our main index is 10 shards with 1 replica, and it rolls over at 500 GB. I added 2 data nodes, expecting the incoming shards to spread out over the 12 nodes, maybe 2 per new node but the rest on the old nodes. Well, that was wrong. After googling a bit, I now realize the masters use shard count to determine shard allocation, and the result was that my 10 shard index, both primary AND replica shards, was getting written SOLELY to those 2 new nodes, which DRASTICALLY slowed down the ingestion rate (queue is WAY backed up).
Realizing what it was doing, I added 3 more nodes. It's helped, but since it's still writing data to just 5 nodes, I'm still not getting the throughput I was getting with the original 10 nodes.
Is my only real option going to be to add about 5 nodes at a time so that it allocates 2 primary shards per new node + 2 replicas per new node?
I am trying something now: I lowered the rollover to a ridiculously small number: 20 GB. Now it should perform rollovers at an artificially high rate (every 20 minutes). The short term goal is to rapidly increase the number of shards on the new nodes with the hope that it will more rapidly reach parity with the original 10 nodes. The medium term goal is that the allocator will start to spread out allocation amongst all 15 nodes which should then ramp the ingestion rate back up.
Is this reasonable? Is there a better/easier way to add one or two new nodes without ALL of the data getting written on those 2 new nodes (throttling ingestion throughput) ? Is my only option going to be to add some number of nodes that are close to my shard count?