We have an ES cluster with 19 total nodes (3 master only, 16 data only) in two racks, A and B. We have allocation set to force on rack via cluster.allocation.awareness.force.rack, which is working properly (no replica and primary on the same rack). We can, for example, shut down all A nodes and maintain yellow status. In total, we have 46 indices, each with 32 primary shards and 1 replica, with a per-node shard limit of 4, ensuring each index has 4 shards on every node (they do). Indices vary in size, with the largest being 2-3 billion docs.
The problem comes with running searches. All search load occurs on B nodes, with A nodes seeing no activity at all. If we shut down B nodes, A nodes will perform searches, but shortly after B nodes return, it will all go back to B again.
As a test, I had queries use the preferred query parameter, as:
preference=_only_nodes: [list of all node IDs]
In that case, all nodes will see load during queries. This certainly seems like it should not be needed, though.
As another test, I restarted all B nodes, meaning all primary shards were on A nodes. This had no effect, as noted earlier. Search requests returned to only loading up B nodes.
Any thoughts of what may be going on here? A setting I may be overlooking?
Thank you for any help.
Edit: Forgot to mention, all nodes are on ES version 5.6.9 running on Ubuntu 18.04 using Oracle Java version 1.8.0_171-b11. Indices were all built on this version as well.