Newbie question on shard and replicas


(vineeth mohan) #1

Hi ,

As far as i have explored ES , what i have understood is

  • A replica of index will have whole documents and it wud b stored in
    that box.
  • A shard is a lucene object which holds a part of the whole index. That
    is if the number of shards to a index is 5 , first shard will have the
    first 20% of the index data , second one will have 20% to 40% of the data
    and so on.
  • When a search is queries , the query hits all shards , and its results
    are aggregated to give the final result.
  • One shard should have atleast 1 copy of the whole documents of that
    index

My doubt here is if we set 2 shards and 2 replica's , will there be 3 copy
of same index to the same shard (That is total 3 copies of the orginal data)
or total 3 copies of the whole index in the whole cluster ?

Thanks
Vineeth


(David Pilato) #2

For each shard, you will find one primary and two replicas in your cluster.

As you have 2 shards, you will have 50% of your docs in the first shard
(with 2 copies) and 50 % in the second shard (with 2 copies).

If you have 100 docs using 10 Mb

You will have 30 Mb used in your cluster (if you have enough nodes).

If you have 1 node, your cluster will be yellow and you will have shard0
with 5 Mb and shard1 with 5Mb

If you have 2 nodes, your cluster will be yellow and you will have shard0
primary, shard 1 replica in the first node using 5 Mb each, and shard0
replica and shard 1 primary in the second node using 5 Mb each. So you will
use 20 Mb in your cluster

If you have 3 nodes, your cluster will be green and you will have something
like shard0 primary, shard 1 replica in the first node using 5 Mb each, and
shard0 replica and shard 1 primary in the second node using 5 Mb each and
shard0 replica and shard 1 replica in the third node using 5 Mb each. So you
will use 30 Mb in your cluster

HTH,

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Vineeth Mohan
Envoyé : jeudi 20 octobre 2011 19:16
À : elasticsearch@googlegroups.com
Objet : Newbie question on shard and replicas

Hi ,

As far as i have explored ES , what i have understood is

  • A replica of index will have whole documents and it wud b stored in
    that box.
  • A shard is a lucene object which holds a part of the whole index.
    That is if the number of shards to a index is 5 , first shard will have the
    first 20% of the index data , second one will have 20% to 40% of the data
    and so on.
  • When a search is queries , the query hits all shards , and its
    results are aggregated to give the final result.
  • One shard should have atleast 1 copy of the whole documents of that
    index

My doubt here is if we set 2 shards and 2 replica's , will there be 3 copy
of same index to the same shard (That is total 3 copies of the orginal data)
or total 3 copies of the whole index in the whole cluster ?

Thanks
Vineeth


(vineeth mohan) #3

That helps a lot David. :slight_smile:
Your post is really a eye opener.

Thanks
VIneeth

On Fri, Oct 21, 2011 at 12:44 AM, David Pilato david@pilato.fr wrote:

For each shard, you will find one primary and two replicas in your cluster.


As you have 2 shards, you will have 50% of your docs in the first shard
(with 2 copies) and 50 % in the second shard (with 2 copies).****


If you have 100 docs using 10 Mb****

You will have 30 Mb used in your cluster (if you have enough nodes).****


If you have 1 node, your cluster will be yellow and you will have shard0
with 5 Mb and shard1 with 5Mb****

If you have 2 nodes, your cluster will be yellow and you will have shard0
primary, shard 1 replica in the first node using 5 Mb each, and shard0
replica and shard 1 primary in the second node using 5 Mb each. So you will
use 20 Mb in your cluster****

If you have 3 nodes, your cluster will be green and you will have something
like shard0 primary, shard 1 replica in the first node using 5 Mb each, and
shard0 replica and shard 1 primary in the second node using 5 Mb each and
shard0 replica and shard 1 replica in the third node using 5 Mb each. So you
will use 30 Mb in your cluster ****


HTH,****

David.****



De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Vineeth Mohan
Envoyé : jeudi 20 octobre 2011 19:16
À : elasticsearch@googlegroups.com
Objet : Newbie question on shard and replicas****


Hi ,

As far as i have explored ES , what i have understood is ****

  • A replica of index will have whole documents and it wud b stored in
    that box.****
  • A shard is a lucene object which holds a part of the whole index.
    That is if the number of shards to a index is 5 , first shard will have the
    first 20% of the index data , second one will have 20% to 40% of the data
    and so on.****
  • When a search is queries , the query hits all shards , and its
    results are aggregated to give the final result.****
  • One shard should have atleast 1 copy of the whole documents of that
    index****

My doubt here is if we set 2 shards and 2 replica's , will there be 3 copy
of same index to the same shard (That is total 3 copies of the orginal data)
or total 3 copies of the whole index in the whole cluster ?

Thanks
Vineeth****


(vineeth mohan) #4

Few more questions ,

I just saw in some posts that once number of shards are set for a index , it
cant be changed.
Well if my search system take lotz of data after years of harvesting , i
might need to increase number of shards to increase performance.
How can i achieve that.

Also lets say there are just 2 machines with a instance of elasticSearch
each. Each have a shard with 0 replica. If that is the case ,
if one machine is dead , will my 50% data be lost ?

Another questions that comes to mind. For small set of data its better to
use 1 shard. In such cases can i put up a condition that IF ONLY document
size is more than N MB , re balance to the next shard , and again if its
only more than M MB (the 2 shards combined), use the next shard and so on.

Thanks
Vineeth

On Fri, Oct 21, 2011 at 8:58 AM, Vineeth Mohan vineethmohan@algotree.comwrote:

That helps a lot David. :slight_smile:
Your post is really a eye opener.

Thanks
VIneeth

On Fri, Oct 21, 2011 at 12:44 AM, David Pilato david@pilato.fr wrote:

For each shard, you will find one primary and two replicas in your
cluster.****

As you have 2 shards, you will have 50% of your docs in the first shard
(with 2 copies) and 50 % in the second shard (with 2 copies).****


If you have 100 docs using 10 Mb****

You will have 30 Mb used in your cluster (if you have enough nodes).****


If you have 1 node, your cluster will be yellow and you will have shard0
with 5 Mb and shard1 with 5Mb****

If you have 2 nodes, your cluster will be yellow and you will have shard0
primary, shard 1 replica in the first node using 5 Mb each, and shard0
replica and shard 1 primary in the second node using 5 Mb each. So you will
use 20 Mb in your cluster****

If you have 3 nodes, your cluster will be green and you will have
something like shard0 primary, shard 1 replica in the first node using 5 Mb
each, and shard0 replica and shard 1 primary in the second node using 5 Mb
each and shard0 replica and shard 1 replica in the third node using 5 Mb
each. So you will use 30 Mb in your cluster ****


HTH,****

David.****



De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Vineeth Mohan
Envoyé : jeudi 20 octobre 2011 19:16
À : elasticsearch@googlegroups.com
Objet : Newbie question on shard and replicas****


Hi ,

As far as i have explored ES , what i have understood is ****

  • A replica of index will have whole documents and it wud b stored in
    that box.****
  • A shard is a lucene object which holds a part of the whole index.
    That is if the number of shards to a index is 5 , first shard will have the
    first 20% of the index data , second one will have 20% to 40% of the data
    and so on.****
  • When a search is queries , the query hits all shards , and its
    results are aggregated to give the final result.****
  • One shard should have atleast 1 copy of the whole documents of that
    index****

My doubt here is if we set 2 shards and 2 replica's , will there be 3 copy
of same index to the same shard (That is total 3 copies of the orginal data)
or total 3 copies of the whole index in the whole cluster ?

Thanks
Vineeth****


(Shay Banon) #5

On Fri, Oct 21, 2011 at 5:36 AM, Vineeth Mohan vineethmohan@algotree.comwrote:

Few more questions ,

I just saw in some posts that once number of shards are set for a index ,
it cant be changed.
Well if my search system take lotz of data after years of harvesting , i
might need to increase number of shards to increase performance.
How can i achieve that.

Yes, you can't change the number of shards. There are ways around that,
including creating "more" shards at start, something like 20 (which will
take you, size wise, to 20 machine capacity), but start with 3 machines.
Another is to use several indices (you can search across them). Routing can
also come in play to constrain searches to specific shards in an index with
many shards.

Also lets say there are just 2 machines with a instance of elasticSearch
each. Each have a shard with 0 replica. If that is the case ,
if one machine is dead , will my 50% data be lost ?

It will not be lost if you can bring that machine back with the data it
held.

Another questions that comes to mind. For small set of data its better to
use 1 shard. In such cases can i put up a condition that IF ONLY document
size is more than N MB , re balance to the next shard , and again if its
only more than M MB (the 2 shards combined), use the next shard and so on.

You can do that by simply using several indices as you want, and "add"
indices later on. You can always search on more than one index.

Thanks
Vineeth

On Fri, Oct 21, 2011 at 8:58 AM, Vineeth Mohan vineethmohan@algotree.comwrote:

That helps a lot David. :slight_smile:
Your post is really a eye opener.

Thanks
VIneeth

On Fri, Oct 21, 2011 at 12:44 AM, David Pilato david@pilato.fr wrote:

For each shard, you will find one primary and two replicas in your
cluster.****

As you have 2 shards, you will have 50% of your docs in the first shard
(with 2 copies) and 50 % in the second shard (with 2 copies).****


If you have 100 docs using 10 Mb****

You will have 30 Mb used in your cluster (if you have enough nodes).****


If you have 1 node, your cluster will be yellow and you will have shard0
with 5 Mb and shard1 with 5Mb****

If you have 2 nodes, your cluster will be yellow and you will have shard0
primary, shard 1 replica in the first node using 5 Mb each, and shard0
replica and shard 1 primary in the second node using 5 Mb each. So you will
use 20 Mb in your cluster****

If you have 3 nodes, your cluster will be green and you will have
something like shard0 primary, shard 1 replica in the first node using 5 Mb
each, and shard0 replica and shard 1 primary in the second node using 5 Mb
each and shard0 replica and shard 1 replica in the third node using 5 Mb
each. So you will use 30 Mb in your cluster ****


HTH,****

David.****



De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Vineeth Mohan
Envoyé : jeudi 20 octobre 2011 19:16
À : elasticsearch@googlegroups.com
Objet : Newbie question on shard and replicas****


Hi ,

As far as i have explored ES , what i have understood is ****

  • A replica of index will have whole documents and it wud b stored in
    that box.****
  • A shard is a lucene object which holds a part of the whole index.
    That is if the number of shards to a index is 5 , first shard will have the
    first 20% of the index data , second one will have 20% to 40% of the data
    and so on.****
  • When a search is queries , the query hits all shards , and its
    results are aggregated to give the final result.****
  • One shard should have atleast 1 copy of the whole documents of that
    index****

My doubt here is if we set 2 shards and 2 replica's , will there be 3
copy of same index to the same shard (That is total 3 copies of the orginal
data) or total 3 copies of the whole index in the whole cluster ?

Thanks
Vineeth****


(system) #6