Recommended setup & configuration for 3 servers

Hi, I was wondering if anyone could advise me, or point me in the direction
of the relevant documentation, on how to best setup elasticsearch to run
across 3 servers.
I'd like the 3 instances on the 3 servers to be replications of each other,
to automatically failover to each other, and to automatically recover and
rebuild in case one of them fell over.
Thanks very much for all advice,
Doug.

I guess the best option is to use load balancers , so that even if 1 machine
fails , it fail overs to someone else.
Next make the number of replica to 1 , so that even if 1 fails , someone
else takes up the job.
Making it to 2 in this contest will also help.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 4:37 PM, doug livesey biot023@gmail.com wrote:

Hi, I was wondering if anyone could advise me, or point me in the direction
of the relevant documentation, on how to best setup elasticsearch to run
across 3 servers.
I'd like the 3 instances on the 3 servers to be replications of each other,
to automatically failover to each other, and to automatically recover and
rebuild in case one of them fell over.
Thanks very much for all advice,
Doug.

On Mon, 2011-10-17 at 16:42 +0530, Vineeth Mohan wrote:

I guess the best option is to use load balancers , so that even if 1
machine fails , it fail overs to someone else.

No need for a load balancer, as long as your client knows about all 3
servers and knows to try the next server in the list if the current
server fails.

The Perl API will do this automatically. I think the Java API will too.
Your mileage may vary with other clients.

Next make the number of replica to 1 , so that even if 1 fails ,
someone else takes up the job.
Making it to 2 in this contest will also help.

Setting replicas to 2 would mean that all 3 machines have all of your
data, so that 2 machines could die at the same time, and the third would
still have all data.

With 1 replica, there will be 2 copies of all your data. If one server
dies, and your cluster has enough time (which depends how much data you
have) to redistribute your shards, then you will be fine.

If 2 servers die at the same time, then you will be missing data.

clint

will elasticSearch take casre of load balancing part.
Like there is a machine X,Y whith the same data. If X feels that Y is a lil
more idle that itself , will it re distribute the load to Y ?
If so , how ?
I would appreciate if you can give some documentations.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 5:25 PM, Clinton Gormley clint@traveljury.comwrote:

On Mon, 2011-10-17 at 16:42 +0530, Vineeth Mohan wrote:

I guess the best option is to use load balancers , so that even if 1
machine fails , it fail overs to someone else.

No need for a load balancer, as long as your client knows about all 3
servers and knows to try the next server in the list if the current
server fails.

The Perl API will do this automatically. I think the Java API will too.
Your mileage may vary with other clients.

Next make the number of replica to 1 , so that even if 1 fails ,
someone else takes up the job.
Making it to 2 in this contest will also help.

Setting replicas to 2 would mean that all 3 machines have all of your
data, so that 2 machines could die at the same time, and the third would
still have all data.

With 1 replica, there will be 2 copies of all your data. If one server
dies, and your cluster has enough time (which depends how much data you
have) to redistribute your shards, then you will be fine.

If 2 servers die at the same time, then you will be missing data.

clint

The Perl API will do this automatically. I think the Java API will too.
Yes. Java API does it perfectly !

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will too.
Yes. Java API does it perfectly !

On Mon, 2011-10-17 at 13:23 +0100, doug livesey wrote:

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be
done out-of-the-box with a few configuration options in elasticsearch?

No, you'd be correct in assuming that it works out of the box :slight_smile:

ES clusters automatically. So the only change you might need to make is
to change your indices from having 1 replica to 2, but that is up to
you. If you have replicas 1, the load is distributed, if you have
replicas 2, then all 3 nodes have the same data.

Incidentally, I'm using the HTTP API.

OK - so your client/application code needs to know about all 3 servers,
and to try the next server in the list if the current server isn't
working.

That's the only bit that you need to handle yourself

clint

I think you should ask ES with the admin REST API information about nodes in the cluster every 5 minutes for example and then use the first node as your main "server node".
If it fails, use the second one.

HTH
David :wink:

Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will too.
Yes. Java API does it perfectly !

Ah, so if I wanted automatic failover, etc., I'd need to be using a client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about nodes
in the cluster every 5 minutes for example and then use the first node as
your main "server node".
If it fails, use the second one.

HTH
David :wink:

Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato < david@pilato.frdavid@pilato.fr>wrote:

The Perl API will do this automatically. I think the Java API will too.
Yes. Java API does it perfectly !

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about nodes
in the cluster every 5 minutes for example and then use the first node as
your main "server node".
If it fails, use the second one.

HTH
David :wink:

Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato < david@pilato.frdavid@pilato.fr

wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

If you use the default Elasticsearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about nodes
in the cluster every 5 minutes for example and then use the first node as
your main "server node".
If it fails, use the second one.
HTH
David :wink:
Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

--
Jérémie 'ahFeel' BORDIER

So if I did this:

  1. Setup elasticsearch on my 3 servers, which are on the same network
  2. Gave them the same cluster.name
  3. Set node.master and node.data to be true for each of them
  4. Told the index I was using to have 3 replicas

That wouldn't achieve what I wanted?

On 17 October 2011 13:59, Vineeth Mohan vineethmohan@algotree.com wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about nodes
in the cluster every 5 minutes for example and then use the first node as
your main "server node".
If it fails, use the second one.

HTH
David :wink:

Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato < david@pilato.fr
david@pilato.fr> wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

100% agree !

David :wink:

Le 17 oct. 2011 à 15:05, Jérémie BORDIER jeremie.bordier@gmail.com a écrit :

If you use the default Elasticsearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about nodes
in the cluster every 5 minutes for example and then use the first node as
your main "server node".
If it fails, use the second one.
HTH
David :wink:
Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

--
Jérémie 'ahFeel' BORDIER

That definitely sounds the best way to go, but I'm struggling understanding
some of what people are suggesting, sorry.
I'm looking through the docs (and have been for some time), but not really
seeing how to do any of this.
Could people suggest some of the config settings I need to research to
better understand some of the suggestions?

On 17 October 2011 14:05, Jérémie BORDIER jeremie.bordier@gmail.com wrote:

If you use the default Elasticsearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a
client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about
nodes
in the cluster every 5 minutes for example and then use the first node
as
your main "server node".
If it fails, use the second one.
HTH
David :wink:
Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

--
Jérémie 'ahFeel' BORDIER

PS -- Sorry if I'm being dense. :slight_smile:

On 17 October 2011 14:07, David Pilato david@pilato.fr wrote:

100% agree !

David :wink:

Le 17 oct. 2011 à 15:05, Jérémie BORDIER jeremie.bordier@gmail.com a
écrit :

If you use the default Elasticsearch client (the Node one, not the
Transport one), your client will act just as another elasticsearch
node and will be aware of added/removed nodes, of how to best route
your queries etc... It's the best way to go.

Jérémie

On Mon, Oct 17, 2011 at 2:59 PM, Vineeth Mohan
vineethmohan@algotree.com wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down
the
master) needs to be made in only 1 place right.

Thanks
Vineeth

On Mon, Oct 17, 2011 at 6:20 PM, doug livesey biot023@gmail.com
wrote:

Ah, so if I wanted automatic failover, etc., I'd need to be using a
client
(it would be Ruby in my case).

On 17 October 2011 13:38, David Pilato david@pilato.fr wrote:

I think you should ask ES with the admin REST API information about
nodes
in the cluster every 5 minutes for example and then use the first node
as
your main "server node".
If it fails, use the second one.
HTH
David :wink:
Le 17 oct. 2011 à 14:23, doug livesey biot023@gmail.com a écrit :

Thanks for those responses.
So would I be wrong in assuming that what I want to achieve can be
done
out-of-the-box with a few configuration options in elasticsearch?
Incidentally, I'm using the HTTP API.

On 17 October 2011 13:15, David Pilato david@pilato.fr wrote:

The Perl API will do this automatically. I think the Java API will
too.
Yes. Java API does it perfectly !

--
Jérémie 'ahFeel' BORDIER

On Mon, 2011-10-17 at 18:29 +0530, Vineeth Mohan wrote:

Wont it be a better idea to use a load balancer instead of ES do it.
In that case , the change (like adding a new ES node or bringing down
the master) needs to be made in only 1 place right.

A load balancer is one option, but frankly, if your client already
handles this issue, then you are adding a redundant layer.

The Perl client API, for example, accepts a list of potential nodes.

When connecting to the cluster for the first time, it tries each node in
the list in turn, until it gets a successful response.

Then it uses the cluster API to retrieve a list of all live nodes that
the cluster knows about.

It round-robins through the list of live nodes (to spread the load
between servers) and if any node fails, it tries to refresh the list of
live servers again. (It also refreshes the live list every $x
requests).

https://metacpan.org/source/DRTECH/ElasticSearch-0.46/lib/ElasticSearch/Transport.pm#L191

You can also configure the Perl client to not retrieve the live list,
but just to round-robin and failover using the provided list of nodes.

clint

On Mon, 2011-10-17 at 14:05 +0100, doug livesey wrote:

So if I did this:

  1. Setup elasticsearch on my 3 servers, which are on the same network
  2. Gave them the same cluster.name
  3. Set node.master and node.data to be true for each of them
  4. Told the index I was using to have 3 replicas

That wouldn't achieve what I wanted?

That is exactly what you need to do on the server side, except 2
replicas, not 3. You have primary + 2 replicas = 3 in total.

So that's all you need on the ES side.

The bit that is missing is on the client side. It needs to know about
all the nodes you have, otherwise if it is only talking to one node and
that node goes down, then it can't failover.

The alternative of doing it in the client would be, as Vineeth suggests,
to use a load balancer which does know about all nodes.

But then what happens if your load balancer goes down :wink:

clint

Right, and the HTTP API doesn't handle failover, so I (or a more featured
client) would have to handle that. Okay, thanks.
How do the nodes on my 3 servers know about each other?
So that when I index to one, the others know about it, too, to replicate it.
Or don't they?
Again, sorry if I'm being dense, I do seem to have made a number of false
assumptions about the features available from the HTTP API.
Cheers,
Doug.

On 17 October 2011 14:15, Clinton Gormley clint@traveljury.com wrote:

On Mon, 2011-10-17 at 14:05 +0100, doug livesey wrote:

So if I did this:

  1. Setup elasticsearch on my 3 servers, which are on the same network
  2. Gave them the same cluster.name
  3. Set node.master and node.data to be true for each of them
  4. Told the index I was using to have 3 replicas

That wouldn't achieve what I wanted?

That is exactly what you need to do on the server side, except 2
replicas, not 3. You have primary + 2 replicas = 3 in total.

So that's all you need on the ES side.

The bit that is missing is on the client side. It needs to know about
all the nodes you have, otherwise if it is only talking to one node and
that node goes down, then it can't failover.

The alternative of doing it in the client would be, as Vineeth suggests,
to use a load balancer which does know about all nodes.

But then what happens if your load balancer goes down :wink:

clint

On Mon, 2011-10-17 at 14:24 +0100, doug livesey wrote:

Right, and the HTTP API doesn't handle failover, so I (or a more
featured client) would have to handle that. Okay, thanks.
How do the nodes on my 3 servers know about each other?
So that when I index to one, the others know about it, too, to
replicate it.
Or don't they?

They do, and without any further configuration.

All you have to do is to make sure that:

  1. each node has the same cluster name and
  2. the nodes can see each other via port 9300
  3. multicast is enabled on your network (or you can use configure your
    nodes to use unicast to discover each other)

Again, sorry if I'm being dense, I do seem to have made a number of
false assumptions about the features available from the HTTP API.

Note: this is not a failure of the HTTP API in ES, but it is the client
you are using which is missing this feature.

As long as your client can speak to the HTTP API of any live node, you
are fine. The problem is if you only speak to one node, and that node
dies, then your client doesn't know how to speak to the other nodes.

clint

Okay, I'm getting a bit clearer, thankyou! :slight_smile:
So if I installed elasticsearch on my 3 servers (all on the same network,
with multicast enabled (is that an apt-get install?)), used the same
clustername for them, and they could all see each other on post 9300, would
...

  1. An index created on one with 2 replicas automatically replicate across
    the 3 servers?
  2. A document indexed to one server automatically be replicated to 2 other
    replicas on the other two servers?
  3. A fallen-over node be able to bring itself back up, repair itself, and
    add itself back into the cluster? Would the service wrapper do this?

& thanks again for taking the time to answer my questions.

On 17 October 2011 14:45, Clinton Gormley clint@traveljury.com wrote:

On Mon, 2011-10-17 at 14:24 +0100, doug livesey wrote:

Right, and the HTTP API doesn't handle failover, so I (or a more
featured client) would have to handle that. Okay, thanks.
How do the nodes on my 3 servers know about each other?
So that when I index to one, the others know about it, too, to
replicate it.
Or don't they?

They do, and without any further configuration.

All you have to do is to make sure that:

  1. each node has the same cluster name and
  2. the nodes can see each other via port 9300
  3. multicast is enabled on your network (or you can use configure your
    nodes to use unicast to discover each other)

Again, sorry if I'm being dense, I do seem to have made a number of
false assumptions about the features available from the HTTP API.

Note: this is not a failure of the HTTP API in ES, but it is the client
you are using which is missing this feature.

As long as your client can speak to the HTTP API of any live node, you
are fine. The problem is if you only speak to one node, and that node
dies, then your client doesn't know how to speak to the other nodes.

clint