Data balancing and backup in the cluster

Hello,

Doing my first steps in ES, I've few questions:

  1. In case the cluster is composed of N nodes, is data split equally
    on the nodes?
  2. In case node is crashed, is its data backuped on the other nodes
    so no data loss in case of such crash? And if the client uses this
    node for queries, will it still get answers, or be notified that error
    occure?
  3. In case master node is crashed, is the cluster still functioning?
  4. In case a new node joins the cluster, how much time takes the
    cluster to re-balance the data (say 1B docs, 4 nodes cluster)?

Thanks

On Saturday, January 15, 2011 at 10:35 PM, barak wrote:

Hello,

Doing my first steps in ES, I've few questions:

  1. In case the cluster is composed of N nodes, is data split equally
    on the nodes?

The aim of the cluster is to get an even number of shards allocated on each node.

  1. In case node is crashed, is its data backuped on the other nodes
    so no data loss in case of such crash? And if the client uses this
    node for queries, will it still get answers, or be notified that error
    occure?

Each shard can have one or more replicas. If a node crashes, the replicas will consist of its backup, so no data is lost, and the shards allocated on that node will get reallocated on the rest of the nodes.

If a client uses that node to query, and that node crashes, then you need to use another node to query. If you use HTTP with the REST API, then you can simply round robin between servers.

  1. In case master node is crashed, is the cluster still functioning?

Yes, another node will be elected as master.

  1. In case a new node joins the cluster, how much time takes the
    cluster to re-balance the data (say 1B docs, 4 nodes cluster)?

Depends on your network. There is no reindexing being done, just moving data around (shards).

Thanks

On Jan 16, 11:40 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If a client uses that node to query, and that node crashes, then you need to use another node to query. If you use HTTP with the REST API, then you can simply round robin between servers.

Is this means that java clients ( TransportClient ) cannot be cached?
In case of adding or removing nodes to the cluster, the client must
recreated to recognize the change?

Hi,

TransportClient can learn about changes in the cluster, no need to recreate
it.

Regards,
Lukas
Dne 16.1.2011 15:53 "barak" barak.yaish@gmail.com napsal(a):

On Jan 16, 11:40 am, Shay Banon shay.ba...@elasticsearch.com wrote:

If a client uses that node to query, and that node crashes, then you need
to use another node to query. If you use HTTP with the REST API, then you
can simply round robin between servers.

Is this means that java clients ( TransportClient ) cannot be cached?
In case of adding or removing nodes to the cluster, the client must
recreated to recognize the change?