Definition help


(project2501) #1

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.
  2. Index - I know what this means in Lucene, but is it identical in ES
    or is there more? e.g. logical definition? or physical?
  3. Shard - I see that indices are created with 'shards'. But what IS a
    shard and why does it exist? When I have an index with 5 shards, does
    that mean 5 physical lucene indices that are separate?
  4. Replica - I read that a shard has a replica. How does that work? If
    I have 5 shards, I need 5 replicas for each shard as failover data? Or
    1 replica merges the data of 5 shards into itself?

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

thanks.


(Shay Banon) #2

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.
    Yes, and instance of elasticsearch.
  2. Index - I know what this means in Lucene, but is it identical in ES
    or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it broken down into shards and shards can have replicas. It also hold mappings definition and specific settings associated with it.

  1. Shard - I see that indices are created with 'shards'. But what IS a
    shard and why does it exist? When I have an index with 5 shards, does
    that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

  1. Replica - I read that a shard has a replica. How does that work? If
    I have 5 shards, I need 5 replicas for each shard as failover data? Or
    1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to 1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram.html
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html

thanks.


(project2501) #3

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

  1. Index - I know what this means in Lucene, but is it identical in ES
    or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it broken down into shards and shards can have replicas. It also hold mappings definition and specific settings associated with it.

  1. Shard - I see that indices are created with 'shards'. But what IS a
    shard and why does it exist? When I have an index with 5 shards, does
    that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

  1. Replica - I read that a shard has a replica. How does that work? If
    I have 5 shards, I need 5 replicas for each shard as failover data? Or
    1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to 1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram...http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...

thanks.


(Lukáš Vlček) #4

Hi,

If an index is broken down to 5 shards then it means that each shard can be
allocated on different physical machine (and thus each shard can grow up to
the capabilities of given machine). If you were to sum up all 5 shards you
would get a combined index that would not fit into any machine out of those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5150@gmail.com wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

  1. Index - I know what this means in Lucene, but is it identical in ES
    or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it
broken down into shards and shards can have replicas. It also hold mappings
definition and specific settings associated with it.

  1. Shard - I see that indices are created with 'shards'. But what IS a
    shard and why does it exist? When I have an index with 5 shards, does
    that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

  1. Replica - I read that a shard has a replica. How does that work? If
    I have 5 shards, I need 5 replicas for each shard as failover data? Or
    1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to
1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram...http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-.
..

thanks.


(project2501) #5

Thanks Lukas.

So if I have a cluster of 100 machines and I want to have an index for
'documents'.
Would I have 1 'document' index and 100 shards? Then the elasticsearch
nodes discover
each other and decide how to distribute the load among them?

On Jan 26, 12:29 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

If an index is broken down to 5 shards then it means that each shard can be
allocated on different physical machine (and thus each shard can grow up to
the capabilities of given machine). If you were to sum up all 5 shards you
would get a combined index that would not fit into any machine out of those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5...@gmail.com wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

  1. Index - I know what this means in Lucene, but is it identical in ES
    or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it
broken down into shards and shards can have replicas. It also hold mappings
definition and specific settings associated with it.

  1. Shard - I see that indices are created with 'shards'. But what IS a
    shard and why does it exist? When I have an index with 5 shards, does
    that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

  1. Replica - I read that a shard has a replica. How does that work? If
    I have 5 shards, I need 5 replicas for each shard as failover data? Or
    1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to
1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....
..

thanks.


(Lukáš Vlček) #6

Yes.
I would recommend you to simply give ES a try. It would help you to get
familiar with it much faster btw.

Regards,
Lukáš

Dne 28.1.2012 13:42 "project2501" darreng5150@gmail.com napsal(a):

Thanks Lukas.

So if I have a cluster of 100 machines and I want to have an index for
'documents'.
Would I have 1 'document' index and 100 shards? Then the elasticsearch
nodes discover
each other and decide how to distribute the load among them?

On Jan 26, 12:29 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

If an index is broken down to 5 shards then it means that each shard can
be
allocated on different physical machine (and thus each shard can grow up
to
the capabilities of given machine). If you were to sum up all 5 shards
you
would get a combined index that would not fit into any machine out of
those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent
and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5...@gmail.com
wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

  1. Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

  1. Index - I know what this means in Lucene, but is it identical
    in ES

or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data.
It it

broken down into shards and shards can have replicas. It also hold
mappings

definition and specific settings associated with it.

  1. Shard - I see that indices are created with 'shards'. But what
    IS a

shard and why does it exist? When I have an index with 5 shards,
does

that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

  1. Replica - I read that a shard has a replica. How does that
    work? If

I have 5 shards, I need 5 replicas for each shard as failover
data? Or

1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set
to

1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these
relationships.

Reading the API docs to learn these relationships is rather hard.
Any

help appreciated.

http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram..
..

..

thanks.


(system) #7