Providing HA/HP searching using ES

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:

  1. Provide a single set of active/passive proxy nodes, use them to handle
    all searches for all servers on our network. This is not my preferred
    option, because we're not making full use of the machines in our
    environment, and I'm worried about overloading a single proxy and having to
    install a set of active/passive machines, which feels even more inefficient
    to me.

  2. Provide every web server in our network (currently around 7, will grow
    potentially to 12-15 within the next year, maybe) with an ES client node,
    joined to the cluster. This seems ideal to me, as each server can then
    communicate with itself to figure out how to distribute queries, and if we
    lose a single server no other server is impacted. Additionally, loss of ES
    cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT #1?

Many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A 3rd option could be using TransportClient in webapps. I'd prefer this option as the "webapp" won't send events which will modify the cluster state.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way to provide HA/HP searching across our ES cluster. At the moment, I have two ideas:

  1. Provide a single set of active/passive proxy nodes, use them to handle all searches for all servers on our network. This is not my preferred option, because we're not making full use of the machines in our environment, and I'm worried about overloading a single proxy and having to install a set of active/passive machines, which feels even more inefficient to me.

  2. Provide every web server in our network (currently around 7, will grow potentially to 12-15 within the next year, maybe) with an ES client node, joined to the cluster. This seems ideal to me, as each server can then communicate with itself to figure out how to distribute queries, and if we lose a single server no other server is impacted. Additionally, loss of ES cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing data and processing searches, and another 10-15 client nodes that are part of the cluster. I don't know enough about the internals of ES's clustering to know if this number of clients will saturate the network, cause unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT #1?

Many thanks!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. Have you tested to overload a single proxy? Not sure what client
    transport you prefer. If you use HTTP, just use HTTP proxy setups (nginx,
    apache, ...), configure HTTP HA, and expose them separate from cluster
    nodes.

  2. Data-less client nodes are an option for special situations. These are
    common in memory result set scaling situations. Consider huge result set
    aggregations you want to move out from cluster data nodes. They do not
    saturate the network more than other setups.

  3. I also prefer David's option if you go with Java. The set up is an array
    of TransportClients that handle the external requests in a frontend (you
    could also add nginx) while the number of cluster nodes in the backend can
    scale separate from the frontend. You get both 1. and 2. in one solution.

To complete the HA scenario, consider failover mechanisms in the frontend.

Jörg

On Sun, Sep 29, 2013 at 6:59 AM, David Pilato david@pilato.fr wrote:

A 3rd option could be using TransportClient in webapps. I'd prefer this
option as the "webapp" won't send events which will modify the cluster
state.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 29 sept. 2013 à 07:33, Shane Allen opiate@gmail.com a écrit :

Hi,

I am setting up an ES cluster. I am trying to decide what is the best way
to provide HA/HP searching across our ES cluster. At the moment, I have two
ideas:

  1. Provide a single set of active/passive proxy nodes, use them to handle
    all searches for all servers on our network. This is not my preferred
    option, because we're not making full use of the machines in our
    environment, and I'm worried about overloading a single proxy and having to
    install a set of active/passive machines, which feels even more inefficient
    to me.

  2. Provide every web server in our network (currently around 7, will grow
    potentially to 12-15 within the next year, maybe) with an ES client node,
    joined to the cluster. This seems ideal to me, as each server can then
    communicate with itself to figure out how to distribute queries, and if we
    lose a single server no other server is impacted. Additionally, loss of ES
    cluster storage nodes will have minimal impact.

My concern is that with #2 I might end up with 3-5 cluster nodes storing
data and processing searches, and another 10-15 client nodes that are part
of the cluster. I don't know enough about the internals of ES's clustering
to know if this number of clients will saturate the network, cause
unreliability in the heartbeat mechanism of ES, or cause other problems.

Does anyone have any feedback on #2, or any specific recommendations WRT
#1?

Many thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.