Definition help

project2501 · January 25, 2012, 1:35pm

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.
Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?
Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?
Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

thanks.

kimchy · January 25, 2012, 4:43pm

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.
Yes, and instance of elasticsearch.

Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it broken down into shards and shards can have replicas. It also hold mappings definition and specific settings associated with it.

Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to 1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

thanks.

project2501 · January 26, 2012, 1:51pm

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it broken down into shards and shards can have replicas. It also hold mappings definition and specific settings associated with it.

Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to 1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

Elasticsearch Platform — Find real-time answers at scale | Elastic...

thanks.

Lukas_Vlcek1 · January 26, 2012, 5:29pm

Hi,

If an index is broken down to 5 shards then it means that each shard can be
allocated on different physical machine (and thus each shard can grow up to
the capabilities of given machine). If you were to sum up all 5 shards you
would get a combined index that would not fit into any machine out of those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5150@gmail.com wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it
broken down into shards and shards can have replicas. It also hold mappings
definition and specific settings associated with it.

Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to
1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

Elasticsearch Platform — Find real-time answers at scale | Elastic.
..

thanks.

project2501 · January 28, 2012, 12:42pm

Thanks Lukas.

So if I have a cluster of 100 machines and I want to have an index for
'documents'.
Would I have 1 'document' index and 100 shards? Then the elasticsearch
nodes discover
each other and decide how to distribute the load among them?

On Jan 26, 12:29 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

If an index is broken down to 5 shards then it means that each shard can be
allocated on different physical machine (and thus each shard can grow up to
the capabilities of given machine). If you were to sum up all 5 shards you
would get a combined index that would not fit into any machine out of those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5...@gmail.com wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data. It it
broken down into shards and shards can have replicas. It also hold mappings
definition and specific settings associated with it.

Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set to
1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any
help appreciated.

Elasticsearch Platform — Find real-time answers at scale | Elastic....
..

thanks.

Lukas_Vlcek1 · January 28, 2012, 2:23pm

Yes.
I would recommend you to simply give ES a try. It would help you to get
familiar with it much faster btw.

Regards,
Lukáš

Dne 28.1.2012 13:42 "project2501" darreng5150@gmail.com napsal(a):

Thanks Lukas.

So if I have a cluster of 100 machines and I want to have an index for
'documents'.
Would I have 1 'document' index and 100 shards? Then the elasticsearch
nodes discover
each other and decide how to distribute the load among them?

On Jan 26, 12:29 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

If an index is broken down to 5 shards then it means that each shard can
be
allocated on different physical machine (and thus each shard can grow up
to
the capabilities of given machine). If you were to sum up all 5 shards
you
would get a combined index that would not fit into any machine out of
those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.

Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent
and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.

Regards,
Lukas

On Thu, Jan 26, 2012 at 2:51 PM, project2501 darreng5...@gmail.com
wrote:

Thanks Shay.

So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Performance? scaling?

Appreciate the info.

On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:

On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:

Hi,
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
trouble.

Node - I think this means an instance of ES running on a server.

Yes, and instance of elasticsearch.

Index - I know what this means in Lucene, but is it identical
in ES
or is there more? e.g. logical definition? or physical?

An index has is a logical concept that encapsulates specific data.
It it
broken down into shards and shards can have replicas. It also hold
mappings
definition and specific settings associated with it.

Shard - I see that indices are created with 'shards'. But what
IS a
shard and why does it exist? When I have an index with 5 shards,
does
that mean 5 physical lucene indices that are separate?

Yes, and if you have replicas, then more.

Replica - I read that a shard has a replica. How does that
work? If
I have 5 shards, I need 5 replicas for each shard as failover
data? Or
1 replica merges the data of 5 shards into itself?

If you have an index with 5 shards and index.number_of_replicas set
to
1, then each shard will have a single replica (two copies).

Maybe there is a simple diagram somewhere showing these
relationships.
Reading the API docs to learn these relationships is rather hard.
Any
help appreciated.

Elasticsearch Platform — Find real-time answers at scale | Elastic..
..
..

thanks.

Topic		Replies	Views
Shards and replicas Elasticsearch	16	1384	July 6, 2017
Newbie question on shard and replicas Elasticsearch	5	412	July 6, 2017
Shard Elasticsearch	1	243	July 6, 2017
Elasticsearch Shards/Indices planning Elasticsearch	4	1146	December 20, 2018
Relation between shards and nodes Elasticsearch	5	1779	November 22, 2017

Definition help

Related topics