I have a question regarding the query routing inside of a cluster. Let say, node holds full data replication and can satisfy every search request it receives. Will it redirect/reroute queries to other nodes in the cluster in case of a high load?
The motivation is the following: I have 3 nodes distributed across availability zones; 10 indices with 1 shard and 2 replicas each.
Now, in each zone there is a group of machines that connects to the node in the same zone via the transport client. Sniffing is turned off so as to reduce network latency and avoid requests from the clients to nodes across the zones.
I'm wondering if nodes may redirect (for some reason) queries to other nodes in the cluster, even if they contain all necessary information for every request.
If this is the case, is there any way to avoid such behavior, cause it produces additional hops that affect search time?
Only nodes that hold a shard (primary or replica) of an index you specified
participate in handling queries. Replica are selected by round-robin afaik
(or maybe at random order?). Which nodes are selected by a query is
determined at once by the node that receives the query within the cluster.
It is the node the TransportClient is connected to, because the
TransportClient is not able to keep a copy of the cluster state. This proxy
node also collects the results from the shards and builds the response.
There is no forwarding or redirecting from one node to another after the
proxy node has dispatched the query down to the shard level.
I have a question regarding the query routing inside of a cluster. Let
say, node holds full data replication and can satisfy every search request
it receives. Will it redirect/reroute queries to other nodes in the cluster
in case of a high load?
The motivation is the following: I have 3 nodes distributed across
availability zones; 10 indices with 1 shard and 2 replicas each.
Now, in each zone there is a group of machines that connects to the node
in the same zone via the transport client. Sniffing is turned off so as to
reduce network latency and avoid requests from the clients to nodes across
the zones.
I'm wondering if nodes may redirect (for some reason) queries to other
nodes in the cluster, even if they contain all necessary information for
every request.
If this is the case, is there any way to avoid such behavior, cause it
produces additional hops that affect search time?
Only nodes that hold a shard (primary or replica) of an index you
specified participate in handling queries. Replica are selected by
round-robin afaik (or maybe at random order?). Which nodes are selected by
a query is determined at once by the node that receives the query within
the cluster. It is the node the TransportClient is connected to, because
the TransportClient is not able to keep a copy of the cluster state. This
proxy node also collects the results from the shards and builds the
response. There is no forwarding or redirecting from one node to another
after the proxy node has dispatched the query down to the shard level.
I have a question regarding the query routing inside of a cluster. Let
say, node holds full data replication and can satisfy every search request
it receives. Will it redirect/reroute queries to other nodes in the cluster
in case of a high load?
The motivation is the following: I have 3 nodes distributed across
availability zones; 10 indices with 1 shard and 2 replicas each.
Now, in each zone there is a group of machines that connects to the node
in the same zone via the transport client. Sniffing is turned off so as to
reduce network latency and avoid requests from the clients to nodes across
the zones.
I'm wondering if nodes may redirect (for some reason) queries to other
nodes in the cluster, even if they contain all necessary information for
every request.
If this is the case, is there any way to avoid such behavior, cause it
produces additional hops that affect search time?
Jörg and Clinton, thank you very much for so quick answers.
Jörg, just to make sure that I got things right.
Assume, the node (that the TransportClient is connected to) holds shard of
the specified in query index and no cluster.routing.allocation.awareness is
applied. Then, this proxy node still may decide to apply the request on the
shard located on the other node (for example, to perform the round-robin on
replica)?
Clinton, I'm not using cluster.routing.allocation.awareness yet. Thank you
for the hint!
Savva
On Wednesday, May 14, 2014 10:08:24 PM UTC+3, Clinton Gormley wrote:
Hi Savva
I presume you're using cluster.routing.allocation.awareness? If so, then
shards on nodes with the same node attributes are preferred:
Only nodes that hold a shard (primary or replica) of an index you
specified participate in handling queries. Replica are selected by
round-robin afaik (or maybe at random order?). Which nodes are selected by
a query is determined at once by the node that receives the query within
the cluster. It is the node the TransportClient is connected to, because
the TransportClient is not able to keep a copy of the cluster state. This
proxy node also collects the results from the shards and builds the
response. There is no forwarding or redirecting from one node to another
after the proxy node has dispatched the query down to the shard level.
I have a question regarding the query routing inside of a cluster. Let
say, node holds full data replication and can satisfy every search request
it receives. Will it redirect/reroute queries to other nodes in the cluster
in case of a high load?
The motivation is the following: I have 3 nodes distributed across
availability zones; 10 indices with 1 shard and 2 replicas each.
Now, in each zone there is a group of machines that connects to the node
in the same zone via the transport client. Sniffing is turned off so as to
reduce network latency and avoid requests from the clients to nodes across
the zones.
I'm wondering if nodes may redirect (for some reason) queries to other
nodes in the cluster, even if they contain all necessary information for
every request.
If this is the case, is there any way to avoid such behavior, cause it
produces additional hops that affect search time?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.