Query Routing Issue


We have an ES cluster with 19 total nodes (3 master only, 16 data only) in two racks, A and B. We have allocation set to force on rack via cluster.allocation.awareness.force.rack, which is working properly (no replica and primary on the same rack). We can, for example, shut down all A nodes and maintain yellow status. In total, we have 46 indices, each with 32 primary shards and 1 replica, with a per-node shard limit of 4, ensuring each index has 4 shards on every node (they do). Indices vary in size, with the largest being 2-3 billion docs.

The problem comes with running searches. All search load occurs on B nodes, with A nodes seeing no activity at all. If we shut down B nodes, A nodes will perform searches, but shortly after B nodes return, it will all go back to B again.

As a test, I had queries use the preferred query parameter, as:
preference=_only_nodes: [list of all node IDs]

In that case, all nodes will see load during queries. This certainly seems like it should not be needed, though.

As another test, I restarted all B nodes, meaning all primary shards were on A nodes. This had no effect, as noted earlier. Search requests returned to only loading up B nodes.

Any thoughts of what may be going on here? A setting I may be overlooking?

Thank you for any help.

Edit: Forgot to mention, all nodes are on ES version 5.6.9 running on Ubuntu 18.04 using Oracle Java version 1.8.0_171-b11. Indices were all built on this version as well.

I think this is explained in the docs:

When executing search or GET requests, with shard awareness enabled, Elasticsearch will prefer using local shards — shards in the same awareness group — to execute the request. This is usually faster than crossing between racks or across zone boundaries.

I think you should be able to override this if you specify a custom preference value, e.g. timestamp string, with each query.

You are correct, and that is documented behavior, my mistake missing that.

I am able to more simply use all nodes via setting preference=_only_nodes:*

In the kindest possible way, though, I must say, this seems below ideal to assume that it will be more expensive by default to not use a large portion (likely half or more) of the available hardware due to the possibility of network latency increases. I certainly know in our case, the latency to/from either rack is identical, and as such, performance is substantially degraded in this default case.

I do not see a way, either, to adjust this default behavior. A cluster setting, for example, to configure this routing behavior for queries, while leaving allocation behavior as-is. An allocation setting that also adjusts query-level routing seems rather subtle, apart from the note pointed out on that page.

Happy to have this resolved, and this has inspired me to try to get more involved in the ES community to provide feedback like this.

Thank you for the help!

I guess just distributing queries across both zones would also lead to a more even distribution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.