Data balancing and backup in the cluster

Barak_Yaish · January 15, 2011, 8:35pm

Hello,

Doing my first steps in ES, I've few questions:

In case the cluster is composed of N nodes, is data split equally
on the nodes?
In case node is crashed, is its data backuped on the other nodes
so no data loss in case of such crash? And if the client uses this
node for queries, will it still get answers, or be notified that error
occure?
In case master node is crashed, is the cluster still functioning?
In case a new node joins the cluster, how much time takes the
cluster to re-balance the data (say 1B docs, 4 nodes cluster)?

Thanks

kimchy · January 16, 2011, 9:40am

On Saturday, January 15, 2011 at 10:35 PM, barak wrote:

Hello,

Doing my first steps in ES, I've few questions:

In case the cluster is composed of N nodes, is data split equally
on the nodes?

The aim of the cluster is to get an even number of shards allocated on each node.

In case node is crashed, is its data backuped on the other nodes
so no data loss in case of such crash? And if the client uses this
node for queries, will it still get answers, or be notified that error
occure?

Each shard can have one or more replicas. If a node crashes, the replicas will consist of its backup, so no data is lost, and the shards allocated on that node will get reallocated on the rest of the nodes.

If a client uses that node to query, and that node crashes, then you need to use another node to query. If you use HTTP with the REST API, then you can simply round robin between servers.

In case master node is crashed, is the cluster still functioning?

Yes, another node will be elected as master.

In case a new node joins the cluster, how much time takes the
cluster to re-balance the data (say 1B docs, 4 nodes cluster)?

Depends on your network. There is no reindexing being done, just moving data around (shards).

Thanks

Barak_Yaish · January 16, 2011, 2:53pm

On Jan 16, 11:40 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If a client uses that node to query, and that node crashes, then you need to use another node to query. If you use HTTP with the REST API, then you can simply round robin between servers.

Is this means that java clients ( TransportClient ) cannot be cached?
In case of adding or removing nodes to the cluster, the client must
recreated to recognize the change?

Lukas_Vlcek1 · January 16, 2011, 3:47pm

Hi,

TransportClient can learn about changes in the cluster, no need to recreate
it.

Regards,
Lukas
Dne 16.1.2011 15:53 "barak" barak.yaish@gmail.com napsal(a):

On Jan 16, 11:40 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If a client uses that node to query, and that node crashes, then you need
to use another node to query. If you use HTTP with the REST API, then you
can simply round robin between servers.

Is this means that java clients ( TransportClient ) cannot be cached?
In case of adding or removing nodes to the cluster, the client must
recreated to recognize the change?

Topic		Replies	Views
Cluster questions Elasticsearch	7	353	July 6, 2017
Can ES correctly re-balance shards when all master nodes down? Elasticsearch	8	1375	July 5, 2017
Data loss with 0.19.8 Elasticsearch	3	636	July 6, 2017
Shard rebalancing after node restart Elasticsearch	2	771	July 5, 2017
Will a rolling restart lose data? Elasticsearch	4	849	July 6, 2017

Data balancing and backup in the cluster

Related topics