Confusing replica count setting


(Ask Bjørn Hansen) #1

Hi everyone,

I am running 0.11.0.

I have a two node cluster (for testing and not much data). I wanted
to make sure that the data is on both nodes so if one goes away, the
system will continue running on the other.

I expected that I needed to set number_of_replicas to 2, but that
seems to actually make the system want to make 3 copies of the data,
is that right?

Vaguely related question:

What'd be the right setting for gateway.recover_after_nodes in a 2
node cluster?

  • ask

curl -XPUT 'http://x17.dev:9200/jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 1 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

$ curl -PUET 'http://x17.dev:9200jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 2 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5


(Lukáš Vlček) #2

Hi,

On Wed, Sep 29, 2010 at 9:24 AM, Ask Bjørn Hansen ask@develooper.comwrote:

Hi everyone,

I am running 0.11.0.

I have a two node cluster (for testing and not much data). I wanted
to make sure that the data is on both nodes so if one goes away, the
system will continue running on the other.

I expected that I needed to set number_of_replicas to 2, but that
seems to actually make the system want to make 3 copies of the data,
is that right?

If number_or_replicas is set to 1 then it means that for each shard there
can be allocated one other replica in the cluster (if node available, ES
will not allocate shard and its replica to the same node).

Vaguely related question:

What'd be the right setting for gateway.recover_after_nodes in a 2
node cluster?

I found the following thread in mail list:
http://elasticsearch-users.115913.n3.nabble.com/gateway-recover-after-nodes-question-tp1480972.html
Is it helpful?

  • ask

curl -XPUT 'http://x17.dev:9200/jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 1 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

$ curl -PUET 'http://x17.dev:9200jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 2 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5


(Shay Banon) #3

Hey,

Lukas answered the question regarding the replicas, its basically the
number of replicas for each shard. So a 2 shards with 1 replica will end up
with total of 4 shards.

Regarding the recover_after_nodes, it only affect the system after a full
cluster shutdown, so I would say set it to 2, since a full cluster shutdown
will probably mean you are there to make sure two nodes start :).

-shay.banon

On Wed, Sep 29, 2010 at 9:58 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

On Wed, Sep 29, 2010 at 9:24 AM, Ask Bjørn Hansen ask@develooper.comwrote:

Hi everyone,

I am running 0.11.0.

I have a two node cluster (for testing and not much data). I wanted
to make sure that the data is on both nodes so if one goes away, the
system will continue running on the other.

I expected that I needed to set number_of_replicas to 2, but that
seems to actually make the system want to make 3 copies of the data,
is that right?

If number_or_replicas is set to 1 then it means that for each shard there
can be allocated one other replica in the cluster (if node available, ES
will not allocate shard and its replica to the same node).

Vaguely related question:

What'd be the right setting for gateway.recover_after_nodes in a 2
node cluster?

I found the following thread in mail list:
http://elasticsearch-users.115913.n3.nabble.com/gateway-recover-after-nodes-question-tp1480972.html
Is it helpful?

  • ask

curl -XPUT 'http://x17.dev:9200/jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 1 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

$ curl -PUET 'http://x17.dev:9200jobso/_settings' -d '{ "index" :
{ "number_of_replicas": 2 } }'; echo
{"ok":true}

$ curl -XGET 'http://x17.dev:9200/_cluster/health?pretty=true'
{
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5


(Ask Bjørn Hansen) #4

On Sep 29, 2:27 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Lukas answered the question regarding the replicas, its basically the
number of replicas for each shard. So a 2 shards with 1 replica will end up
with total of 4 shards.

Yes, my confusion here is that "1 replica" means "one extra copy" –
this makes perfect sense but was just not the terminology I was used
to. For a small setup it's very nice that it automatically figures
out to just spread out to two copies by default.

Regarding the recover_after_nodes, it only affect the system after a full
cluster shutdown, so I would say set it to 2, since a full cluster shutdown
will probably mean you are there to make sure two nodes start :).

If after a full shutdown one node isn't coming back because of some
hardware problem or similar, would that prevent the other node from
starting up correctly?

  • ask

(Shay Banon) #5

On Thu, Sep 30, 2010 at 3:11 AM, Ask Bjørn Hansen ask@develooper.comwrote:

On Sep 29, 2:27 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

Lukas answered the question regarding the replicas, its basically the
number of replicas for each shard. So a 2 shards with 1 replica will end
up
with total of 4 shards.

Yes, my confusion here is that "1 replica" means "one extra copy" –
this makes perfect sense but was just not the terminology I was used
to. For a small setup it's very nice that it automatically figures
out to just spread out to two copies by default.

Regarding the recover_after_nodes, it only affect the system after a
full
cluster shutdown, so I would say set it to 2, since a full cluster
shutdown
will probably mean you are there to make sure two nodes start :).

If after a full shutdown one node isn't coming back because of some
hardware problem or similar, would that prevent the other node from
starting up correctly?

Yes, the recovery process will not start up until the number of nodes is
discovered, so if just 1 will be in the picture, then nothing will be
recovered.

  • ask

(Ask Bjørn Hansen) #6

On Sep 29, 10:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, the recovery process will not start up until the number of nodes is
discovered, so if just 1 will be in the picture, then nothing will be
recovered.

For two node clusters it'd be useful with a setting for "wait X
seconds for another node and start recovery then" -- or is there one
already?


(Shay Banon) #7

There is something similar, there is gateway.recover_after_time that will
also wait for that time to recover, but there isn't something that waits for
2 or that amount of time given has passed. Which does make sense, want to
add an issue for that (and a name for that setting :wink: ).

On Thu, Sep 30, 2010 at 10:46 AM, Ask Bjørn Hansen ask@develooper.comwrote:

On Sep 29, 10:13 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, the recovery process will not start up until the number of nodes is
discovered, so if just 1 will be in the picture, then nothing will be
recovered.

For two node clusters it'd be useful with a setting for "wait X
seconds for another node and start recovery then" -- or is there one
already?


(system) #8