Providing HA/HP searching using ES

Shane_Allen · September 29, 2013, 4:33am

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:

Provide a single set of active/passive proxy nodes, use them to handle
all searches for all servers on our network. This is not my preferred
option, because we're not making full use of the machines in our
environment, and I'm worried about overloading a single proxy and having to
install a set of active/passive machines, which feels even more inefficient
to me.
Provide every web server in our network (currently around 7, will grow
potentially to 12-15 within the next year, maybe) with an ES client node,
joined to the cluster. This seems ideal to me, as each server can then
communicate with itself to figure out how to distribute queries, and if we
lose a single server no other server is impacted. Additionally, loss of ES
cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT #1?

Many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · September 29, 2013, 4:59am

A 3rd option could be using TransportClient in webapps. I'd prefer this option as the "webapp" won't send events which will modify the cluster state.

My 2 cents.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way to provide HA/HP searching across our ES cluster. At the moment, I have two ideas:

Provide a single set of active/passive proxy nodes, use them to handle all searches for all servers on our network. This is not my preferred option, because we're not making full use of the machines in our environment, and I'm worried about overloading a single proxy and having to install a set of active/passive machines, which feels even more inefficient to me.

Provide every web server in our network (currently around 7, will grow potentially to 12-15 within the next year, maybe) with an ES client node, joined to the cluster. This seems ideal to me, as each server can then communicate with itself to figure out how to distribute queries, and if we lose a single server no other server is impacted. Additionally, loss of ES cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing data and processing searches, and another 10-15 client nodes that are part of the cluster. I don't know enough about the internals of ES's clustering to know if this number of clients will saturate the network, cause unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT #1?

Many thanks!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · September 29, 2013, 8:14am

Have you tested to overload a single proxy? Not sure what client
transport you prefer. If you use HTTP, just use HTTP proxy setups (nginx,
apache, ...), configure HTTP HA, and expose them separate from cluster
nodes.
Data-less client nodes are an option for special situations. These are
common in memory result set scaling situations. Consider huge result set
aggregations you want to move out from cluster data nodes. They do not
saturate the network more than other setups.
I also prefer David's option if you go with Java. The set up is an array
of TransportClients that handle the external requests in a frontend (you
could also add nginx) while the number of cluster nodes in the backend can
scale separate from the frontend. You get both 1. and 2. in one solution.

To complete the HA scenario, consider failover mechanisms in the frontend.

Jörg

On Sun, Sep 29, 2013 at 6:59 AM, David Pilato david@pilato.fr wrote:

A 3rd option could be using TransportClient in webapps. I'd prefer this
option as the "webapp" won't send events which will modify the cluster
state.

My 2 cents.

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:

Provide a single set of active/passive proxy nodes, use them to handle
all searches for all servers on our network. This is not my preferred
option, because we're not making full use of the machines in our
environment, and I'm worried about overloading a single proxy and having to
install a set of active/passive machines, which feels even more inefficient
to me.

Provide every web server in our network (currently around 7, will grow
potentially to 12-15 within the next year, maybe) with an ES client node,
joined to the cluster. This seems ideal to me, as each server can then
communicate with itself to figure out how to distribute queries, and if we
lose a single server no other server is impacted. Additionally, loss of ES
cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT
#1?

Many thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
HAProxy over Elasticsearch Cluster Elasticsearch	1	457	July 10, 2019
It is a good idea to implement Ha Proxy in front of Elasticsearch 7? Elasticsearch es-hadoop	2	599	October 25, 2021
Some questions about ElasticSearch Elasticsearch	2	932	July 5, 2017
Proxy for elastic search cluster Elasticsearch	1	445	March 8, 2017
Setup high availability env using elasticsearch cluster Elasticsearch	9	10379	May 18, 2017

Providing HA/HP searching using ES

Many thanks!

Related topics