I have a cluster with data node 7 which has 100+ indexes, with shards number varying from 7-14, with 1 replica. As the indexes are time-series, all of them get created at 12 AM UTC. The problem is that all the primary shards of most of the indexes are allocated to one node(say N1) and few primary shards and replicas are assigned to other nodes.
Values set to:
cluster.routing.rebalance.enable: all
cluster.routing.allocation.allow_rebalance: indices_all_active
thread_pool.bulk.queue_size: 1000
During bulk indexing requests, all of the requests go to that node (N1) and CPU utilization increases for this node. A lot of requests are also rejected as the queue size is exceeded on that node. Whereas other nodes are pretty chilled out.
Doubts:
- Is the above issue, is it because all the primary shards are on one node only?
- If yes, Can I rebalance my primary shards by setting "cluster.routing.rebalance.enable" to "primaries". Would this configuration first rebalance my primary shards and then balance the replicas? Are there any repercussions.
- Is there any other cause of the issue and is there a way to mitigate it?