Cluster questions

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:

http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to DEBUG
for rootLogger in logging.yml. Then start one node and index some data. Then
start second node and see log files. Once the second node is available some
index shards are allocated to the new node. By default ES uses 5 shards with
1 replica for each shard. If node goes down then you can use health API to
see if you still have all data available for search (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/).
It then depends on the speed in which particular nodes go down (crash, not
regular shutdown) because if there is only one shard left and no replica is
available then it should replicate itself to some other node (providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodnetic@gmail.com wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Thanks Lukas,

I looked at this the other day, but I don't think that answers my Qs,
or at least I can't tell without hearing Shay's accompanying talk. :slight_smile:

So, questions:

  • Slide 94: cluster with settings: replicas = 1, shards = 2, and a
    node with 2 shards, OK

  • Slide 95: 2nd node appears and end sup with the same shards as on
    node 1.
    -- Q: doesn't this mean that shards were replicated? Why, if
    number_of_replicas=1 ?

  • Slide 96: 3rd and 4th nodes appear and after that each node in the
    cluster ends up with just 1 shard. This makes sense.
    -- Q: This happens automagically?
    -- Q: Does ES simply copy the index/shard from one node to the
    other when it detects more nodes joined the cluster?

  • Slide 107: a new index with 2 shards is added (shards=1,
    replicas=1), but 2 nodes get that new index, even though replicas=1.
    Huh?

So the above questions are really about understanding why an index/
shard is being replicated even when replicas=1.

I think the above may also answer what happens when the cluster is
expanded: ES detects new nodes joining and figures out that they can
handle/host some of the indices or index replicas and somehow send
them to the new nodes. I assume there are mechanisms in ES to let it
spread the data evenly, though, I am guessing, it doesn't take into
account query rates, so it is not yet able to migrate hot indices
around automatically?

I think the above doesn't cover what happens when 1 or more nodes go
down: I know ES detects that, but does it know which indices were on
those nodes and does it automatically create more replicas for those
indices in order to satisfy the replicas=X setting?

Thanks,
Otis

On Jun 15, 3:33 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo...
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo...Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to DEBUG
for rootLogger in logging.yml. Then start one node and index some data. Then
start second node and see log files. Once the second node is available some
index shards are allocated to the new node. By default ES uses 5 shards with
1 replica for each shard. If node goes down then you can use health API to
see if you still have all data available for search (http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...).
It then depends on the speed in which particular nodes go down (crash, not
regular shutdown) because if there is only one shard left and no replica is
available then it should replicate itself to some other node (providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodne...@gmail.com wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Otis,

number_of_replicas=1 means each shard has 1 replica, meaning there are 2
copies of each shard.
You seem to take number of replicas as number of copies. not exactly the
same thing.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Jun 18, 2010 at 5:31 PM, Otis otis.gospodnetic@gmail.com wrote:

Thanks Lukas,

I looked at this the other day, but I don't think that answers my Qs,
or at least I can't tell without hearing Shay's accompanying talk. :slight_smile:

So, questions:

  • Slide 94: cluster with settings: replicas = 1, shards = 2, and a
    node with 2 shards, OK

  • Slide 95: 2nd node appears and end sup with the same shards as on
    node 1.
    -- Q: doesn't this mean that shards were replicated? Why, if
    number_of_replicas=1 ?

  • Slide 96: 3rd and 4th nodes appear and after that each node in the
    cluster ends up with just 1 shard. This makes sense.
    -- Q: This happens automagically?
    -- Q: Does ES simply copy the index/shard from one node to the
    other when it detects more nodes joined the cluster?

  • Slide 107: a new index with 2 shards is added (shards=1,
    replicas=1), but 2 nodes get that new index, even though replicas=1.
    Huh?

So the above questions are really about understanding why an index/
shard is being replicated even when replicas=1.

I think the above may also answer what happens when the cluster is
expanded: ES detects new nodes joining and figures out that they can
handle/host some of the indices or index replicas and somehow send
them to the new nodes. I assume there are mechanisms in ES to let it
spread the data evenly, though, I am guessing, it doesn't take into
account query rates, so it is not yet able to migrate hot indices
around automatically?

I think the above doesn't cover what happens when 1 or more nodes go
down: I know ES detects that, but does it know which indices were on
those nodes and does it automatically create more replicas for those
indices in order to satisfy the replicas=X setting?

Thanks,
Otis

On Jun 15, 3:33 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo...
<http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo..
.>Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to
DEBUG
for rootLogger in logging.yml. Then start one node and index some data.
Then
start second node and see log files. Once the second node is available
some
index shards are allocated to the new node. By default ES uses 5 shards
with
1 replica for each shard. If node goes down then you can use health API
to
see if you still have all data available for search (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...).
It then depends on the speed in which particular nodes go down (crash,
not
regular shutdown) because if there is only one shard left and no replica
is
available then it should replicate itself to some other node (providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodne...@gmail.com
wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Hi Otis,

I know that Shay would be able to shed more light on this... but let us do
our homework now (since many of your questions can be answered by running a
test).

One node A is started and a new index is created (2 shards, 1 replica)
curl -XPUT 'http://localhost:9200/twitter/' -d '
index :
number_of_shards : 2
number_of_replicas : 1
'
Now we can see node A has two shards allocated on it.
The cluster status is YELLOW.
(Cluster is not healthy but still all data is available for search.)

Second node B is started.
Node B has index replicas on it. (because number_of_replicas=1)
Cluster status is GREEN.
(At tis point your cluster is healthy)

Third node C is started.
One primary shard from node A is moved to C.
So now index primary shards are located on A and C. Both replicas on B.
Cluster status is GREEN.

Fourth node D is started.
The node D got one of replicas from B.
Cluster status is GREEN.

Now you can shutdown one node at a time until two nodes are left.
Cluster status is GREEN.

  1. Shutdown one node. Only one is left.
    Two shards and no replicas are on the last remaining node.
    Cluster status is YELLOW.

So as you can see if replicas=X setting can not be satisfied then you can
learn this from cluster health status (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/health/
).

Also see below for my other comments.

On Fri, Jun 18, 2010 at 11:31 PM, Otis otis.gospodnetic@gmail.com wrote:

Thanks Lukas,

I looked at this the other day, but I don't think that answers my Qs,
or at least I can't tell without hearing Shay's accompanying talk. :slight_smile:

So, questions:

  • Slide 94: cluster with settings: replicas = 1, shards = 2, and a
    node with 2 shards, OK

  • Slide 95: 2nd node appears and end sup with the same shards as on
    node 1.
    -- Q: doesn't this mean that shards were replicated? Why, if
    number_of_replicas=1 ?

This is obvious. number_or_replicas = 1 means there is one (non-primary)
shard for each (primary) shard.
If you had created index with 0 replicas and two shards then you would have
got GREEN status in the first step (meaning replicas=X setting is satisfied
).

  • Slide 96: 3rd and 4th nodes appear and after that each node in the
    cluster ends up with just 1 shard. This makes sense.
    -- Q: This happens automagically?

As you can see from the test above this is happening automagically :slight_smile:

-- Q: Does ES simply copy the index/shard from one node to the
other when it detects more nodes joined the cluster?

I think it can take it from gateway but I will let Shay to provide more
details.

  • Slide 107: a new index with 2 shards is added (shards=1,
    replicas=1), but 2 nodes get that new index, even though replicas=1.
    Huh?

This is correct. Isn't it? One primary shard on one node and the second node
gets replica of that shard (if it wouldn't be possible to locate second node
for replica then you would be able to learn this from health status).

So the above questions are really about understanding why an index/
shard is being replicated even when replicas=1.

I think the above may also answer what happens when the cluster is
expanded: ES detects new nodes joining and figures out that they can
handle/host some of the indices or index replicas and somehow send
them to the new nodes. I assume there are mechanisms in ES to let it
spread the data evenly, though, I am guessing, it doesn't take into
account query rates, so it is not yet able to migrate hot indices
around automatically?

I think this is not implemented yet. However, it seems that there is a lot
of data available and implementing specific traffic load handlers would be
possible (I would be surprised it that is not planed).

I think the above doesn't cover what happens when 1 or more nodes go
down: I know ES detects that, but does it know which indices were on
those nodes and does it automatically create more replicas for those
indices in order to satisfy the replicas=X setting?

It is known which shards and its replicas are located on which nodes.
See
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluster/state/
the
"routing_table.indices" part. (Note with current master you can expect to
get more info from REST API)

Thanks,
Otis

On Jun 15, 3:33 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo...
<http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo..
.>Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to
DEBUG
for rootLogger in logging.yml. Then start one node and index some data.
Then
start second node and see log files. Once the second node is available
some
index shards are allocated to the new node. By default ES uses 5 shards
with
1 replica for each shard. If node goes down then you can use health API
to
see if you still have all data available for search (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...).
It then depends on the speed in which particular nodes go down (crash,
not
regular shutdown) because if there is only one shard left and no replica
is
available then it should replicate itself to some other node (providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodne...@gmail.com
wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Thank you for the detailed reply, Lukáš!

The reason why I didn't get the number_of_replicas=X meaning is
because in my mind "number of replicas" really means "number of
identical copies", while in ES it means "the number of additional
copies". Also, in HDFS there is a notion of a replication factor that
matches "my" thinking: if replication factor is 3, that means there
are 3 copies of each data block in HDFS. Isn't this the more common
way of thinking about/counting replicas?

Oh, and you mentioned that maybe ES copies replicas from the gateway,
but is gateway not an optional thing?

Thanks,
Otis

On Jun 19, 2:08 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi Otis,

I know that Shay would be able to shed more light on this... but let us do
our homework now (since many of your questions can be answered by running a
test).

One node A is started and a new index is created (2 shards, 1 replica)
curl -XPUT 'http://localhost:9200/twitter/'-d '
index :
number_of_shards : 2
number_of_replicas : 1
'
Now we can see node A has two shards allocated on it.
The cluster status is YELLOW.
(Cluster is not healthy but still all data is available for search.)

Second node B is started.
Node B has index replicas on it. (because number_of_replicas=1)
Cluster status is GREEN.
(At tis point your cluster is healthy)

Third node C is started.
One primary shard from node A is moved to C.
So now index primary shards are located on A and C. Both replicas on B.
Cluster status is GREEN.

Fourth node D is started.
The node D got one of replicas from B.
Cluster status is GREEN.

Now you can shutdown one node at a time until two nodes are left.
Cluster status is GREEN.

  1. Shutdown one node. Only one is left.
    Two shards and no replicas are on the last remaining node.
    Cluster status is YELLOW.

So as you can see if replicas=X setting can not be satisfied then you can
learn this from cluster health status (http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...
).

Also see below for my other comments.

On Fri, Jun 18, 2010 at 11:31 PM, Otis otis.gospodne...@gmail.com wrote:

Thanks Lukas,

I looked at this the other day, but I don't think that answers my Qs,
or at least I can't tell without hearing Shay's accompanying talk. :slight_smile:

So, questions:

  • Slide 94: cluster with settings: replicas = 1, shards = 2, and a
    node with 2 shards, OK
  • Slide 95: 2nd node appears and end sup with the same shards as on
    node 1.
    -- Q: doesn't this mean that shards were replicated? Why, if
    number_of_replicas=1 ?

This is obvious. number_or_replicas = 1 means there is one (non-primary)
shard for each (primary) shard.
If you had created index with 0 replicas and two shards then you would have
got GREEN status in the first step (meaning replicas=X setting is satisfied
).

  • Slide 96: 3rd and 4th nodes appear and after that each node in the
    cluster ends up with just 1 shard. This makes sense.
    -- Q: This happens automagically?

As you can see from the test above this is happening automagically :slight_smile:

-- Q: Does ES simply copy the index/shard from one node to the
other when it detects more nodes joined the cluster?

I think it can take it from gateway but I will let Shay to provide more
details.

  • Slide 107: a new index with 2 shards is added (shards=1,
    replicas=1), but 2 nodes get that new index, even though replicas=1.
    Huh?

This is correct. Isn't it? One primary shard on one node and the second node
gets replica of that shard (if it wouldn't be possible to locate second node
for replica then you would be able to learn this from health status).

So the above questions are really about understanding why an index/
shard is being replicated even when replicas=1.

I think the above may also answer what happens when the cluster is
expanded: ES detects new nodes joining and figures out that they can
handle/host some of the indices or index replicas and somehow send
them to the new nodes. I assume there are mechanisms in ES to let it
spread the data evenly, though, I am guessing, it doesn't take into
account query rates, so it is not yet able to migrate hot indices
around automatically?

I think this is not implemented yet. However, it seems that there is a lot
of data available and implementing specific traffic load handlers would be
possible (I would be surprised it that is not planed).

I think the above doesn't cover what happens when 1 or more nodes go
down: I know ES detects that, but does it know which indices were on
those nodes and does it automatically create more replicas for those
indices in order to satisfy the replicas=X setting?

It is known which shards and its replicas are located on which nodes.
Seehttp://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...
the
"routing_table.indices" part. (Note with current master you can expect to
get more info from REST API)

Thanks,
Otis

On Jun 15, 3:33 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo...
<http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo..
.>Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to
DEBUG
for rootLogger in logging.yml. Then start one node and index some data.
Then
start second node and see log files. Once the second node is available
some
index shards are allocated to the new node. By default ES uses 5 shards
with
1 replica for each shard. If node goes down then you can use health API
to
see if you still have all data available for search (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...).
It then depends on the speed in which particular nodes go down (crash,
not
regular shutdown) because if there is only one shard left and no replica
is
available then it should replicate itself to some other node (providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodne...@gmail.com
wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data that
disappeared when that node went down?

Thanks,
Otis

Gateway is required to support full cluster shutdown. See more here:
http://www.elasticsearch.com/blog/2010/02/16/searchengine_time_machine.html.

Recovery from the gateway happens only when the first ever shard gets
created, once its done, shards do recovery from other shard in the same
replication group when they are moved around or instantiated.

-shay.banon

On Tue, Jun 22, 2010 at 7:48 AM, Otis otis.gospodnetic@gmail.com wrote:

Thank you for the detailed reply, Lukáš!

The reason why I didn't get the number_of_replicas=X meaning is
because in my mind "number of replicas" really means "number of
identical copies", while in ES it means "the number of additional
copies". Also, in HDFS there is a notion of a replication factor that
matches "my" thinking: if replication factor is 3, that means there
are 3 copies of each data block in HDFS. Isn't this the more common
way of thinking about/counting replicas?

Oh, and you mentioned that maybe ES copies replicas from the gateway,
but is gateway not an optional thing?

Thanks,
Otis

On Jun 19, 2:08 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi Otis,

I know that Shay would be able to shed more light on this... but let us
do
our homework now (since many of your questions can be answered by running
a
test).

One node A is started and a new index is created (2 shards, 1 replica)
curl -XPUT 'http://localhost:9200/twitter/'-d '
index :
number_of_shards : 2
number_of_replicas : 1
'
Now we can see node A has two shards allocated on it.
The cluster status is YELLOW.
(Cluster is not healthy but still all data is available for search.)

Second node B is started.
Node B has index replicas on it. (because number_of_replicas=1)
Cluster status is GREEN.
(At tis point your cluster is healthy)

Third node C is started.
One primary shard from node A is moved to C.
So now index primary shards are located on A and C. Both replicas on B.
Cluster status is GREEN.

Fourth node D is started.
The node D got one of replicas from B.
Cluster status is GREEN.

Now you can shutdown one node at a time until two nodes are left.
Cluster status is GREEN.

  1. Shutdown one node. Only one is left.
    Two shards and no replicas are on the last remaining node.
    Cluster status is YELLOW.

So as you can see if replicas=X setting can not be satisfied then you can
learn this from cluster health status (
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...
).

Also see below for my other comments.

On Fri, Jun 18, 2010 at 11:31 PM, Otis otis.gospodne...@gmail.com
wrote:

Thanks Lukas,

I looked at this the other day, but I don't think that answers my Qs,
or at least I can't tell without hearing Shay's accompanying talk. :slight_smile:

So, questions:

  • Slide 94: cluster with settings: replicas = 1, shards = 2, and a
    node with 2 shards, OK
  • Slide 95: 2nd node appears and end sup with the same shards as on
    node 1.
    -- Q: doesn't this mean that shards were replicated? Why, if
    number_of_replicas=1 ?

This is obvious. number_or_replicas = 1 means there is one (non-primary)
shard for each (primary) shard.
If you had created index with 0 replicas and two shards then you would
have
got GREEN status in the first step (meaning replicas=X setting is
satisfied
).

  • Slide 96: 3rd and 4th nodes appear and after that each node in the
    cluster ends up with just 1 shard. This makes sense.
    -- Q: This happens automagically?

As you can see from the test above this is happening automagically :slight_smile:

-- Q: Does ES simply copy the index/shard from one node to the
other when it detects more nodes joined the cluster?

I think it can take it from gateway but I will let Shay to provide more
details.

  • Slide 107: a new index with 2 shards is added (shards=1,
    replicas=1), but 2 nodes get that new index, even though replicas=1.
    Huh?

This is correct. Isn't it? One primary shard on one node and the second
node
gets replica of that shard (if it wouldn't be possible to locate second
node
for replica then you would be able to learn this from health status).

So the above questions are really about understanding why an index/
shard is being replicated even when replicas=1.

I think the above may also answer what happens when the cluster is
expanded: ES detects new nodes joining and figures out that they can
handle/host some of the indices or index replicas and somehow send
them to the new nodes. I assume there are mechanisms in ES to let it
spread the data evenly, though, I am guessing, it doesn't take into
account query rates, so it is not yet able to migrate hot indices
around automatically?

I think this is not implemented yet. However, it seems that there is a
lot
of data available and implementing specific traffic load handlers would
be
possible (I would be surprised it that is not planed).

I think the above doesn't cover what happens when 1 or more nodes go
down: I know ES detects that, but does it know which indices were on
those nodes and does it automatically create more replicas for those
indices in order to satisfy the replicas=X setting?

It is known which shards and its replicas are located on which nodes.
Seehttp://
www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...
the
"routing_table.indices" part. (Note with current master you can expect to
get more info from REST API)

Thanks,
Otis

On Jun 15, 3:33 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Otis,

I am sure most of these questions are addressed in Elasticsearch
presentation from Berlin:
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo.
..
<
http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwo..
.>Automatic
shard allocation - pages 90-108.

Also it is very easy to verify yourself, just change logging level to
DEBUG
for rootLogger in logging.yml. Then start one node and index some
data.
Then
start second node and see log files. Once the second node is
available
some
index shards are allocated to the new node. By default ES uses 5
shards
with
1 replica for each shard. If node goes down then you can use health
API
to
see if you still have all data available for search (

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/cluste...).

It then depends on the speed in which particular nodes go down
(crash,
not
regular shutdown) because if there is only one shard left and no
replica
is
available then it should replicate itself to some other node
(providing
replica is set to 1 or more).

Regards,
Lukas

On Tue, Jun 15, 2010 at 8:46 AM, Otis otis.gospodne...@gmail.com
wrote:

Hi,

In ES, what controls which node/shard a doc will get indexed on?

What happens (or what does one need to do) when the search cluster
is
expanded? Is there a notion of (automatic) rebalancing?

Similarly, what happens or should be done when a node in a cluster
goes down? Is there something that automatically replicates data
that
disappeared when that node went down?

Thanks,
Otis