If an index is broken down to 5 shards then it means that each shard can be
allocated on different physical machine (and thus each shard can grow up to
the capabilities of given machine). If you were to sum up all 5 shards you
would get a combined index that would not fit into any machine out of those
five. So yes, this help scalability as well as performance because all
shards can be processed (for example searched) concurrently.
Just note that elasticsearch has the distributed notion in its DNA since
the very beginning so everything in it is about distributed, concurrent and
possibly [near] real time processing (including index and search
operations). That is why you need index sharding and many other concepts
found in it.
On Thu, Jan 26, 2012 at 2:51 PM, project2501 email@example.com wrote:
So a shard is a 'physical' index, in the lucene sense? And an ES index
is broken down into multiple physical indexes for some reason?
Appreciate the info.
On Jan 25, 11:43 am, Shay Banon kim...@gmail.com wrote:
On Wednesday, January 25, 2012 at 3:35 PM, project2501 wrote:
I've read the online guides and searched for past threads. I'm
trying to get a clear definition of the following terms but having
- Node - I think this means an instance of ES running on a server.
Yes, and instance of elasticsearch.
- Index - I know what this means in Lucene, but is it identical in ES
or is there more? e.g. logical definition? or physical?
An index has is a logical concept that encapsulates specific data. It it
broken down into shards and shards can have replicas. It also hold mappings
definition and specific settings associated with it.
- Shard - I see that indices are created with 'shards'. But what IS a
shard and why does it exist? When I have an index with 5 shards, does
that mean 5 physical lucene indices that are separate?
Yes, and if you have replicas, then more.
- Replica - I read that a shard has a replica. How does that work? If
I have 5 shards, I need 5 replicas for each shard as failover data? Or
1 replica merges the data of 5 shards into itself?
If you have an index with 5 shards and index.number_of_replicas set to
1, then each shard will have a single replica (two copies).
Maybe there is a simple diagram somewhere showing these relationships.
Reading the API docs to learn these relationships is rather hard. Any