I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:
Provide a single set of active/passive proxy nodes, use them to handle
all searches for all servers on our network. This is not my preferred
option, because we're not making full use of the machines in our
environment, and I'm worried about overloading a single proxy and having to
install a set of active/passive machines, which feels even more inefficient
to me.
Provide every web server in our network (currently around 7, will grow
potentially to 12-15 within the next year, maybe) with an ES client node,
joined to the cluster. This seems ideal to me, as each server can then
communicate with itself to figure out how to distribute queries, and if we
lose a single server no other server is impacted. Additionally, loss of ES
cluster storage nodes will have minimal impact.
My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.
Does anyone have any feedback on #2, or any specific recommendations WRT #1?
A 3rd option could be using TransportClient in webapps. I'd prefer this option as the "webapp" won't send events which will modify the cluster state.
My 2 cents.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :
Hi,
I am setting up an ES cluster. I am trying to decide what is the best way to provide HA/HP searching across our ES cluster. At the moment, I have two ideas:
Provide a single set of active/passive proxy nodes, use them to handle all searches for all servers on our network. This is not my preferred option, because we're not making full use of the machines in our environment, and I'm worried about overloading a single proxy and having to install a set of active/passive machines, which feels even more inefficient to me.
Provide every web server in our network (currently around 7, will grow potentially to 12-15 within the next year, maybe) with an ES client node, joined to the cluster. This seems ideal to me, as each server can then communicate with itself to figure out how to distribute queries, and if we lose a single server no other server is impacted. Additionally, loss of ES cluster storage nodes will have minimal impact.
My concern is that with #2 I might end up with 3-5 cluster nodes storing data and processing searches, and another 10-15 client nodes that are part of the cluster. I don't know enough about the internals of ES's clustering to know if this number of clients will saturate the network, cause unreliability in the heartbeat mechanism of ES, or cause other problems.
Does anyone have any feedback on #2, or any specific recommendations WRT #1?
Have you tested to overload a single proxy? Not sure what client
transport you prefer. If you use HTTP, just use HTTP proxy setups (nginx,
apache, ...), configure HTTP HA, and expose them separate from cluster
nodes.
Data-less client nodes are an option for special situations. These are
common in memory result set scaling situations. Consider huge result set
aggregations you want to move out from cluster data nodes. They do not
saturate the network more than other setups.
I also prefer David's option if you go with Java. The set up is an array
of TransportClients that handle the external requests in a frontend (you
could also add nginx) while the number of cluster nodes in the backend can
scale separate from the frontend. You get both 1. and 2. in one solution.
To complete the HA scenario, consider failover mechanisms in the frontend.
Jörg
On Sun, Sep 29, 2013 at 6:59 AM, David Pilato david@pilato.fr wrote:
A 3rd option could be using TransportClient in webapps. I'd prefer this
option as the "webapp" won't send events which will modify the cluster
state.
My 2 cents.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :
Hi,
I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:
Provide a single set of active/passive proxy nodes, use them to handle
all searches for all servers on our network. This is not my preferred
option, because we're not making full use of the machines in our
environment, and I'm worried about overloading a single proxy and having to
install a set of active/passive machines, which feels even more inefficient
to me.
Provide every web server in our network (currently around 7, will grow
potentially to 12-15 within the next year, maybe) with an ES client node,
joined to the cluster. This seems ideal to me, as each server can then
communicate with itself to figure out how to distribute queries, and if we
lose a single server no other server is impacted. Additionally, loss of ES
cluster storage nodes will have minimal impact.
My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.
Does anyone have any feedback on #2, or any specific recommendations WRT #1?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.