ES: 7.6.x
3 coordinator, 3 master, 14 data
6 shards, 1 replica
All primary shards of some indices keep getting assigned to only 1 of 14 data nodes. So data node 12 has 6 primary shards of index x, y, z, etc. The other 13 nodes are normal.
This results in high JVM heap usage for data node 12 and also unassigned shards (reason: circuitBreakingException
).
Why would this happen for only data node 12?
What I have tried:
- Set
-XX:CMSInitiatingOccupancyFraction=50
- Find each index with all of its primary shards on data node 12 and set
index.routing.allocation.total_shards_per_node
to 1 or 2
a. This helps but all primary shards of some new indices still get allocated to data node 12
What I have not tried:
- Set
cluster.routing.allocation.total_shards_per_node
in index template
a. Not sure how this would work when settingindex.routing.allocation.require._name
(I cannot use ILM for some reason so I wrote the shrink process myself and I need all primary shards for an index on the same node in order to shrink. See here) - Change
circuitBreaker
settings
EDIT: a coworker found this and it looks like this is the problem.