A data set is sharded when it's broken into pieces and distributed across
nodes. Simple (but with issues) example would be sharding all names in a
data set about people across 26 nodes - one for each letter in the
alphabet. Unfortunately that example balances badly (your Z node will be
underused and your S node might be swamped).
A shard is replicated when there's more than one copy.
Shards enable parallel processing on separate nodes. Replicas improve
throughput as you have more choices about where to process data and they
improve availability. Once you have the basics, there are many good
discussions on sharding / replication strategies on this list.
On Tue, Jun 5, 2012 at 7:14 PM, Jérome firstname.lastname@example.org wrote:
I don't understand the method of shard and replicas.
I read the doc but don't understand, what's a shard and a replicas.
A shard seems be a piece of original data and replica some save of
these shard ?
I've see this :
And can i have some explanation on the N°14 (diagram especially) too ?
Thanks for help.