Shards/replica strategy, possibility to re-balance them


(Ludovic Levesque) #1

Hi all, Shay,

can we force a re-balance of shards and replicas over nodes ?

Currently, I have 7 nodes for three indices, each with 64 shards + 2
replicas (total: 576 shards)

And the distribution is not homogen:
inet[/10.12.11.18:9300] has 82 shards or replicas: indice1(61), indice2(21)
inet[/10.11.11.15:9300] has 83 shards or replicas: indice1(19),
indice2(26), indice3(38)
inet[/10.11.11.17:9300] has 82 shards or replicas: indice1(15),
indice2(40), indice3(27)
inet[/10.12.11.17:9300] has 83 shards or replicas: indice1(49),
indice2(21), indice3(13)
inet[/10.12.11.16:9300] has 82 shards or replicas: indice1(34),
indice2(33), indice3(15)
inet[/10.11.11.16:9300] has 82 shards or replicas: indice1(14),
indice2(28), indice3(40)
inet[/10.11.11.18:9300] has 82 shards or replicas:
indice2(23), indice3(59)

problem is that indice3 is bigger than others (30 times bigger), and I
would prefer to have it distributed on all nodes instead of having 59
shards/replicas on latest node, and nothing on first node.

It's a test cluster, so no big problems for the moment, but for
production, I prefer to know if it's possible or not.

Another question about distribution: is it possible to choose the
distribution of shard and related replicas ? to put replicas on nodes
which are on different subnets for example ?
It will be part of mutli-datacenter deployment, to be able to have ES
ready even if one datacenter is entirely down.

Regards,
Ludo


(Shay Banon) #2

Hi,

There isn't an option to control this (yet). Its on the roadmap. Out of
curiosity, a 64 shards index is quite a big one, are you sure you need this
many?

-shay.banon

On Wed, Nov 10, 2010 at 11:49 AM, Ludovic Levesque luddic@gmail.com wrote:

Hi all, Shay,

can we force a re-balance of shards and replicas over nodes ?

Currently, I have 7 nodes for three indices, each with 64 shards + 2
replicas (total: 576 shards)

And the distribution is not homogen:
inet[/10.12.11.18:9300] has 82 shards or replicas: indice1(61),
indice2(21)
inet[/10.11.11.15:9300] has 83 shards or replicas: indice1(19),
indice2(26), indice3(38)
inet[/10.11.11.17:9300] has 82 shards or replicas: indice1(15),
indice2(40), indice3(27)
inet[/10.12.11.17:9300] has 83 shards or replicas: indice1(49),
indice2(21), indice3(13)
inet[/10.12.11.16:9300] has 82 shards or replicas: indice1(34),
indice2(33), indice3(15)
inet[/10.11.11.16:9300] has 82 shards or replicas: indice1(14),
indice2(28), indice3(40)
inet[/10.11.11.18:9300] has 82 shards or replicas:
indice2(23), indice3(59)

problem is that indice3 is bigger than others (30 times bigger), and I
would prefer to have it distributed on all nodes instead of having 59
shards/replicas on latest node, and nothing on first node.

It's a test cluster, so no big problems for the moment, but for
production, I prefer to know if it's possible or not.

Another question about distribution: is it possible to choose the
distribution of shard and related replicas ? to put replicas on nodes
which are on different subnets for example ?
It will be part of mutli-datacenter deployment, to be able to have ES
ready even if one datacenter is entirely down.

Regards,
Ludo


(Ludovic Levesque) #3

Hi Shay,

thanks for the answer.

It leads to another question:
for a 20 000 000 documents indices, how many shards/replicas do you
recommend, given we have many custom_score, and average document size:
3KB.

Is there a drawback to use too many shards ?
What is the good strategy for a final cluster of N nodes:

  • having N shards (+ replicas)
  • having about N/2 shards
  • less ?
  • more ?

Thanks again,
Ludo

On Wed, Nov 10, 2010 at 12:02 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Hi,
There isn't an option to control this (yet). Its on the roadmap. Out of
curiosity, a 64 shards index is quite a big one, are you sure you need this
many?
-shay.banon
On Wed, Nov 10, 2010 at 11:49 AM, Ludovic Levesque luddic@gmail.com wrote:

Hi all, Shay,

can we force a re-balance of shards and replicas over nodes ?

Currently, I have 7 nodes for three indices, each with 64 shards + 2
replicas (total: 576 shards)

And the distribution is not homogen:
inet[/10.12.11.18:9300] has 82 shards or replicas: indice1(61),
indice2(21)
inet[/10.11.11.15:9300] has 83 shards or replicas: indice1(19),
indice2(26), indice3(38)
inet[/10.11.11.17:9300] has 82 shards or replicas: indice1(15),
indice2(40), indice3(27)
inet[/10.12.11.17:9300] has 83 shards or replicas: indice1(49),
indice2(21), indice3(13)
inet[/10.12.11.16:9300] has 82 shards or replicas: indice1(34),
indice2(33), indice3(15)
inet[/10.11.11.16:9300] has 82 shards or replicas: indice1(14),
indice2(28), indice3(40)
inet[/10.11.11.18:9300] has 82 shards or replicas:
indice2(23), indice3(59)

problem is that indice3 is bigger than others (30 times bigger), and I
would prefer to have it distributed on all nodes instead of having 59
shards/replicas on latest node, and nothing on first node.

It's a test cluster, so no big problems for the moment, but for
production, I prefer to know if it's possible or not.

Another question about distribution: is it possible to choose the
distribution of shard and related replicas ? to put replicas on nodes
which are on different subnets for example ?
It will be part of mutli-datacenter deployment, to be able to have ES
ready even if one datacenter is entirely down.

Regards,
Ludo


(Shay Banon) #4

The way search on an index works is by hitting all the shards (round robin
between replicas). For example, an index with 10 shards will cause a search
to be executed (in parallel) on those 10 shards. The main reason that you
would want to have more shards than the number of nodes is to allow for
future scaling out, a 64 shard index with 1 replica will be able to scale
out to 128 machines before reaching its limit.

If 7 machines are good, I suggest using something like 7-10 shards with N
replica (you decide). Then run your data and see if things look ok. In your
case, you are using more than one index, so take that into account as well.

-shay.banon

On Wed, Nov 10, 2010 at 1:12 PM, Ludovic Levesque luddic@gmail.com wrote:

Hi Shay,

thanks for the answer.

It leads to another question:
for a 20 000 000 documents indices, how many shards/replicas do you
recommend, given we have many custom_score, and average document size:
3KB.

Is there a drawback to use too many shards ?
What is the good strategy for a final cluster of N nodes:

  • having N shards (+ replicas)
  • having about N/2 shards
  • less ?
  • more ?

Thanks again,
Ludo

On Wed, Nov 10, 2010 at 12:02 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Hi,
There isn't an option to control this (yet). Its on the roadmap. Out
of
curiosity, a 64 shards index is quite a big one, are you sure you need
this
many?
-shay.banon
On Wed, Nov 10, 2010 at 11:49 AM, Ludovic Levesque luddic@gmail.com
wrote:

Hi all, Shay,

can we force a re-balance of shards and replicas over nodes ?

Currently, I have 7 nodes for three indices, each with 64 shards + 2
replicas (total: 576 shards)

And the distribution is not homogen:
inet[/10.12.11.18:9300] has 82 shards or replicas: indice1(61),
indice2(21)
inet[/10.11.11.15:9300] has 83 shards or replicas: indice1(19),
indice2(26), indice3(38)
inet[/10.11.11.17:9300] has 82 shards or replicas: indice1(15),
indice2(40), indice3(27)
inet[/10.12.11.17:9300] has 83 shards or replicas: indice1(49),
indice2(21), indice3(13)
inet[/10.12.11.16:9300] has 82 shards or replicas: indice1(34),
indice2(33), indice3(15)
inet[/10.11.11.16:9300] has 82 shards or replicas: indice1(14),
indice2(28), indice3(40)
inet[/10.11.11.18:9300] has 82 shards or replicas:
indice2(23), indice3(59)

problem is that indice3 is bigger than others (30 times bigger), and I
would prefer to have it distributed on all nodes instead of having 59
shards/replicas on latest node, and nothing on first node.

It's a test cluster, so no big problems for the moment, but for
production, I prefer to know if it's possible or not.

Another question about distribution: is it possible to choose the
distribution of shard and related replicas ? to put replicas on nodes
which are on different subnets for example ?
It will be part of mutli-datacenter deployment, to be able to have ES
ready even if one datacenter is entirely down.

Regards,
Ludo


(Ludovic Levesque) #5

Ok thanks. I choose 64 shards for future scaling out, yes.

Ludo

On Wed, Nov 10, 2010 at 4:03 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

The way search on an index works is by hitting all the shards (round robin
between replicas). For example, an index with 10 shards will cause a search
to be executed (in parallel) on those 10 shards. The main reason that you
would want to have more shards than the number of nodes is to allow for
future scaling out, a 64 shard index with 1 replica will be able to scale
out to 128 machines before reaching its limit.
If 7 machines are good, I suggest using something like 7-10 shards with N
replica (you decide). Then run your data and see if things look ok. In your
case, you are using more than one index, so take that into account as well.
-shay.banon

On Wed, Nov 10, 2010 at 1:12 PM, Ludovic Levesque luddic@gmail.com wrote:

Hi Shay,

thanks for the answer.

It leads to another question:
for a 20 000 000 documents indices, how many shards/replicas do you
recommend, given we have many custom_score, and average document size:
3KB.

Is there a drawback to use too many shards ?
What is the good strategy for a final cluster of N nodes:

  • having N shards (+ replicas)
  • having about N/2 shards
  • less ?
  • more ?

Thanks again,
Ludo

On Wed, Nov 10, 2010 at 12:02 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Hi,
There isn't an option to control this (yet). Its on the roadmap. Out
of
curiosity, a 64 shards index is quite a big one, are you sure you need
this
many?
-shay.banon
On Wed, Nov 10, 2010 at 11:49 AM, Ludovic Levesque luddic@gmail.com
wrote:

Hi all, Shay,

can we force a re-balance of shards and replicas over nodes ?

Currently, I have 7 nodes for three indices, each with 64 shards + 2
replicas (total: 576 shards)

And the distribution is not homogen:
inet[/10.12.11.18:9300] has 82 shards or replicas: indice1(61),
indice2(21)
inet[/10.11.11.15:9300] has 83 shards or replicas: indice1(19),
indice2(26), indice3(38)
inet[/10.11.11.17:9300] has 82 shards or replicas: indice1(15),
indice2(40), indice3(27)
inet[/10.12.11.17:9300] has 83 shards or replicas: indice1(49),
indice2(21), indice3(13)
inet[/10.12.11.16:9300] has 82 shards or replicas: indice1(34),
indice2(33), indice3(15)
inet[/10.11.11.16:9300] has 82 shards or replicas: indice1(14),
indice2(28), indice3(40)
inet[/10.11.11.18:9300] has 82 shards or replicas:
indice2(23), indice3(59)

problem is that indice3 is bigger than others (30 times bigger), and I
would prefer to have it distributed on all nodes instead of having 59
shards/replicas on latest node, and nothing on first node.

It's a test cluster, so no big problems for the moment, but for
production, I prefer to know if it's possible or not.

Another question about distribution: is it possible to choose the
distribution of shard and related replicas ? to put replicas on nodes
which are on different subnets for example ?
It will be part of mutli-datacenter deployment, to be able to have ES
ready even if one datacenter is entirely down.

Regards,
Ludo


(system) #6