Hello,
I am encountering what seems to be unexpected behavior with the preference
option in the /_search
API when using the _prefer_nodes
value. Here's the setup and the problem:
Cluster Configuration:
- Elasticsearch version: 8.15.1
- The cluster is configured across three zones:
eu1
,eu2
, andeu3
. - Each index is configured so that every shard exists in all zones:
curl -XPUT -u "$ESU" -H 'Content-Type: application/json' "$ES/*/_settings" -d '{
"number_of_replicas": 2
}'
- Each node is assigned its respective zone using a node attribute:
cluster.routing.allocation.awareness.attributes: "zone"
cluster.routing.allocation.awareness.force.zone.values: eu1,eu2,eu3
node.attr.zone: eu1 # or eu2, eu3 depending on the zone
What Works:
When I perform a /_search
query with preference=_only_nodes:zone:eu1
, it works as expected. Only the nodes in eu1
respond to the query.
Example:
curl -sXGET -H 'Content-Type: application/json' -u "$ESU" "$ES/tmp_test/_search?preference=_only_nodes:zone:eu1" -d '{
"explain": true,
"query": {
"match_all": {}
}
}'
Result: The query is handled exclusively by nodes in the
eu1
zone.
What Doesn't Work:
When I perform a /_search
query with preference=_prefer_nodes:zone:eu1
, nodes from other zones (eu2
, eu3
) also respond to the query, even though the data is available on nodes in eu1
.
Example:
curl -sXGET -H 'Content-Type: application/json' -u "$ESU" "$ES/tmp_test/_search?preference=_prefer_nodes:zone:eu1" -d '{
"explain": true,
"query": {
"match_all": {}
}
}'
Result: Nodes from zones
eu2
and eu3
respond alongside nodes from eu1
, contrary to the expected behavior.
Expected Behavior:
From the documentation, my understanding is that using _prefer_nodes
should prioritize nodes in the specified attribute (in this case, eu1
) while only falling back to others if necessary (e.g., if the data is not available in eu1
).
Since the data exists on nodes in eu1
, I would expect only nodes from eu1
to handle the query.
Question:
Is this the intended behavior of _prefer_nodes
, or is it potentially a bug ?
Are there additional settings or considerations I might have overlooked in configuring this feature?
Thank you for your help! Let me know if I can provide more details or logs to assist with troubleshooting.
EDIT:
When I use _prefer_nodes
but specify the node IDs directly (instead of using the zone
attribute), it also works as expected, prioritizing the specified nodes.
Example:
curl -X GET -H 'Content-Type: application/json' -u "$ESU" "$ES/tmp_test/_search?preference=_prefer_nodes:2LS8vyq9TCGV-sS2RgkA9w,YnYSGPPaTHOCT0GHUQN9nw" -d '{
"explain": true,
"query": {
"match_all": {}
}
}'