Adding new data nodes to busy ingesting cluster

We have a 7.17.4 cluster with 10 data nodes. Our main index is 10 shards with 1 replica, and it rolls over at 500 GB. I added 2 data nodes, expecting the incoming shards to spread out over the 12 nodes, maybe 2 per new node but the rest on the old nodes. Well, that was wrong. After googling a bit, I now realize the masters use shard count to determine shard allocation, and the result was that my 10 shard index, both primary AND replica shards, was getting written SOLELY to those 2 new nodes, which DRASTICALLY slowed down the ingestion rate (queue is WAY backed up).

Realizing what it was doing, I added 3 more nodes. It's helped, but since it's still writing data to just 5 nodes, I'm still not getting the throughput I was getting with the original 10 nodes.

Is my only real option going to be to add about 5 nodes at a time so that it allocates 2 primary shards per new node + 2 replicas per new node?

I am trying something now: I lowered the rollover to a ridiculously small number: 20 GB. Now it should perform rollovers at an artificially high rate (every 20 minutes). The short term goal is to rapidly increase the number of shards on the new nodes with the hope that it will more rapidly reach parity with the original 10 nodes. The medium term goal is that the allocator will start to spread out allocation amongst all 15 nodes which should then ramp the ingestion rate back up.

Is this reasonable? Is there a better/easier way to add one or two new nodes without ALL of the data getting written on those 2 new nodes (throttling ingestion throughput) ? Is my only option going to be to add some number of nodes that are close to my shard count?

I think you should be able to set index.routing.allocation.total_shards_per_node in your index template to force a more even distribution of shards for every index. Be aware to set this high enough to allow both primary and replicas to be allocated even if nodes fail. If you have 20 primary and replica shards and 15 nodes you should be able to set it to 2 as this would allow you to lose 5 nodes and still have all shards allocated.

Christian, I wanted to confirm that adding this parameter to my index template fixed my "all shards are getting allocated on the 2 new / 5 new nodes" problem. Once it spread out over all of the nodes, the throughput went back up to what we were used to. Then we doubled our logstash nodes pulling from kinesis and ramped up our throughput. At this point, the ES data nodes were not the bottleneck.

Thank you so much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.