Guide how to deploy ES in a high availability configuration

We're looking on deploying Elastic Search into EC2 to power the search of
our new product. After crawling elasticsearch.org for tutorials, I found
the EC2 tutorial, which was quite helpful, but I didn't find any guides how
I should implement high availability for elastic search.

All our current production systems are configured and deployed so that any
machine can fail and all traffic, requests etc are directed to the working
nodes and a possible failover procedures have also been fully automated.
This results in high availability without any human intervention (the
techops team does monitoring, but they don't actually need to do much,
because the high amount of automation)

Now I'm thinking how I can do all this to Elastic Search, but I don't have
any good pointers. Now I know that elastic search is powered by lucene and
the indices are distributed into shards, which each have one master and one
or more replicas. What if a master fails? I found some info about master
election and also about the EC2 discovery module
at http://www.elasticsearch.org/guide/reference/modules/discovery/ but that
doesn't really tell me what happens when the master dies. Is there
additional documentation?

What about the RESTful endpoint? What if the machine running this dies? We
are running haproxy in our production environment, so I could very well use
that in front, but I couldn't find any good guides on that topic either. Do
I have to configure some scripts which changes DNS CNAMEs to the new
master, or can/should I use EC2 elastic IP addresses?

All responses and links to relevant tutorials are greatly appreciated.

--

2 Likes

Hello,

first, there's a great presentation from Shay on the topic available at
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html.

Couple of quick notes:

  • Elasticsearch is "highly available" by design and by default. As you
    write, shards are replicated across the cluster for better performance and
    availability. If a primary shard is not available (node goes down, etc),
    one replica will take over the role of the primary.

  • Elasticsearch clusters, again by default, "take care of themselves".
    Either with multicast or properly configured unicast topologies, a new node
    with a correct cluster.name settings will join the cluster, and will
    start serving queries, storing shards, etc.

  • Elasticsearch clusters don't have a SPOF master node: any node which is
    configured to do so (which is a default) can become a master. Thus, if a
    master goes down, another node will take over the role.

  • Given you have enough nodes to place replicas on, your cluster is highly
    available by default. When a node goes down, unless you're running at
    capacity, you just launch another one, and it will take on the duties,
    without human intervention — apart from monitoring.

  • Your HTTP requests are automatically re-directed across the nodes in the
    cluster. If you use a single IP in your application code, though, that is a
    point of failure if that specific machine would go down. There are
    multiple approaches how to solve it. Some libraries automatically perform
    requests in a round-robin fashion and detect healthy nodes. Another
    approach is to launch an elasticsearch "client node" which would serve as a
    proxy, automatically discovering new nodes in the cluster, etc. You could
    also use Nginx or HAproxy in a similar fashion, making sure you
    periodically update the list of nodes in its configuration.

  • EC2 elastic IPs are not suited well for the purpose, since you are billed
    the traffic and it's using the external network channel (unless I'm
    mistaken or the rules changed). EC load balancer (ELB) has the same
    characteristics, unfortunately, in addition to being overloaded/not
    resilient enough at times. A dedicated Nginx/HAproxy/Pound/etc proxy
    machine is a much better solution from the architectural point of view.

Karel

On Tuesday, December 11, 2012 12:39:39 PM UTC+1, Juho Mäkinen wrote:

We're looking on deploying Elastic Search into EC2 to power the search of
our new product. After crawling elasticsearch.org for tutorials, I found
the EC2 tutorial, which was quite helpful, but I didn't find any guides how
I should implement high availability for Elasticsearch.

All our current production systems are configured and deployed so that any
machine can fail and all traffic, requests etc are directed to the working
nodes and a possible failover procedures have also been fully automated.
This results in high availability without any human intervention (the
techops team does monitoring, but they don't actually need to do much,
because the high amount of automation)

Now I'm thinking how I can do all this to Elastic Search, but I don't have
any good pointers. Now I know that Elasticsearch is powered by lucene and
the indices are distributed into shards, which each have one master and one
or more replicas. What if a master fails? I found some info about master
election and also about the EC2 discovery module at
Elasticsearch Platform — Find real-time answers at scale | Elastic but that
doesn't really tell me what happens when the master dies. Is there
additional documentation?

What about the RESTful endpoint? What if the machine running this dies? We
are running haproxy in our production environment, so I could very well use
that in front, but I couldn't find any good guides on that topic either. Do
I have to configure some scripts which changes DNS CNAMEs to the new
master, or can/should I use EC2 elastic IP addresses?

All responses and links to relevant tutorials are greatly appreciated.

--

1 Like

thanks. nice list of ES HA reminders.

On Wednesday, December 12, 2012 3:22:24 AM UTC-7, Karel Minařík wrote:

Hello,

first, there's a great presentation from Shay on the topic available at <
Elasticsearch Platform — Find real-time answers at scale | Elastic

.

Couple of quick notes:

  • Elasticsearch is "highly available" by design and by default. As you
    write, shards are replicated across the cluster for better performance and
    availability. If a primary shard is not available (node goes down, etc),
    one replica will take over the role of the primary.

  • Elasticsearch clusters, again by default, "take care of themselves".
    Either with multicast or properly configured unicast topologies, a new node
    with a correct cluster.name settings will join the cluster, and will
    start serving queries, storing shards, etc.

  • Elasticsearch clusters don't have a SPOF master node: any node which is
    configured to do so (which is a default) can become a master. Thus, if a
    master goes down, another node will take over the role.

  • Given you have enough nodes to place replicas on, your cluster is highly
    available by default. When a node goes down, unless you're running at
    capacity, you just launch another one, and it will take on the duties,
    without human intervention — apart from monitoring.

  • Your HTTP requests are automatically re-directed across the nodes in the
    cluster. If you use a single IP in your application code, though, that is a
    point of failure if that specific machine would go down. There are
    multiple approaches how to solve it. Some libraries automatically perform
    requests in a round-robin fashion and detect healthy nodes. Another
    approach is to launch an elasticsearch "client node" which would serve as a
    proxy, automatically discovering new nodes in the cluster, etc. You could
    also use Nginx or HAproxy in a similar fashion, making sure you
    periodically update the list of nodes in its configuration.

  • EC2 elastic IPs are not suited well for the purpose, since you are
    billed the traffic and it's using the external network channel (unless I'm
    mistaken or the rules changed). EC load balancer (ELB) has the same
    characteristics, unfortunately, in addition to being overloaded/not
    resilient enough at times. A dedicated Nginx/HAproxy/Pound/etc proxy
    machine is a much better solution from the architectural point of view.

Karel

On Tuesday, December 11, 2012 12:39:39 PM UTC+1, Juho Mäkinen wrote:

We're looking on deploying Elastic Search into EC2 to power the search of
our new product. After crawling elasticsearch.org for tutorials, I found
the EC2 tutorial, which was quite helpful, but I didn't find any guides how
I should implement high availability for Elasticsearch.

All our current production systems are configured and deployed so that
any machine can fail and all traffic, requests etc are directed to the
working nodes and a possible failover procedures have also been fully
automated. This results in high availability without any human intervention
(the techops team does monitoring, but they don't actually need to do much,
because the high amount of automation)

Now I'm thinking how I can do all this to Elastic Search, but I don't
have any good pointers. Now I know that Elasticsearch is powered by lucene
and the indices are distributed into shards, which each have one master and
one or more replicas. What if a master fails? I found some info about
master election and also about the EC2 discovery module at
Elasticsearch Platform — Find real-time answers at scale | Elastic but that
doesn't really tell me what happens when the master dies. Is there
additional documentation?

What about the RESTful endpoint? What if the machine running this dies?
We are running haproxy in our production environment, so I could very well
use that in front, but I couldn't find any good guides on that topic
either. Do I have to configure some scripts which changes DNS CNAMEs to the
new master, or can/should I use EC2 elastic IP addresses?

All responses and links to relevant tutorials are greatly appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5330d434-4bcd-488c-9aea-53029fe59644%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.