Rancher 1.6.10, Docker 1.12.*, ElasticSearch 5.5.1 (without x-pack).
15 es data nodes across a 5 rancher node (each rancher node is baremetal server).
3 es data nodes per rancher node in 3 rancher "services" of 5. Each es data node stores data on dedicated local fs of each rancher node.
3 master nodes currently on rancher nodes 1,4,5
each rancher node is setup as a separate "zone" in ES to ensure ES cluster resiliency if a single rancher node should crash.
rancher lb proxy on each rancher node pointing to 1 of 2 nginx web servers currently on rancher nodes 1,2.
nginx web servers point to the 3 ES master nodes (load has been such that client ES nodes haven't appeared to be necessary...yet)
A user has submitted 6 indexing jobs against rancher lb proxy on rancher node 2.
Here is the part that makes no sense. The 3 data nodes on rancher node 2 appear to be doing all in the cpu intensive indexing rather then being distributed across all the data nodes as one might expect the ES master nodes would do after receiving the bulk indexing requests from one of the two nginx web servers. The "zone" identities happen to be same as the bare metal server short name. It seems awful coincidental that the user targeted host name into the cluster just so happens to also be the same rancher node where all the ES data nodes are doing all the cpu intensive indexing...as if somehow it was restricted to "zone". Note I have verified via _cat/shards that the 6 indexes actively being processed have their shards "properly" spread across the 15 es data nodes on all 5 of the rancher nodes.
Why is all the cpu being consumed on a single rancher node restricted to 3 es data nodes??? This is a disconcerting condition that could result in overload of system resources. I have not found any documentation that would indicate this condition is even possible with the given configuration. i.e we are NOT restricting indexed data to a SINGLE zone.