In a previous discussion there's a claim that we shouldn't worry about balancing primary shards across all nodes because primaries and replicas do almost the same amount of work during indexing.
Currently my primaries are highly unbalanced:
I'm having a problem with my bulk index thread queue filling up and tasks being dropped, and the nodes with the most amount of primaries have the most dropped:
http es5-client01.c.fp:9200/_cat/thread_pool | grep bulk | grep data | sort es5-data01 bulk 8 172 23634 es5-data02 bulk 1 0 13082 es5-data03 bulk 1 0 0 es5-data04 bulk 1 0 10812 es5-data05 bulk 0 0 1112 es5-data06 bulk 0 0 2071 es5-data07 bulk 0 0 0
The fact that some nodes have a non-empty queue and other nodes are showing no active jobs leads me to think that having such an unbalanced cluster must affect indexing performance. Jobs are waiting in queue on on one node when they could be being processed on another if the primaries were more balanced. Am I wrong here?