Hi Shay
I have a few questions regarding sharding in ES:
- How is sharding handled in ES? Basically, how is a shard key
determined and how are the documents evenly distributed across various
shards? - How do I specifically say which shard to replicate where? Lets say
for eg, I have 3 nodes/boxes running ES, with 2 replicas and 3 shards
So I want:
a) Server A -> shard 1, shard 2
b) Server B -> shard 2, shard 3
c) Server C -> shard 1, shard 3
How do I setup this configuration? - When I add another node say Server D to the above, and change the
number of shards to 4 and replicas to 3, basically saying this is what
I want to be my new configuration:
a) Server A -> shard 1, shard 2, shard 4
b) Server B -> shard 2, shard 3, shard 1
c) Server C -> shard 1, shard 3, shard 4
d) Server D -> shard 2, shard 3, shard 4
Would the data get re-balanced across shards or will it happen
incrementally over time?
Regarding the gateway (this is more of a clarification), as of now
with the shared fs folder (nfs on unix boxes), I understand all
servers in the cluster write to the same shared folder on nfs. This
definitely makes backing up of indexes easy. Now with the new upcoming
local gateway, can I still configure the folders? The shards to be
written:
a) Server A -> shard 1 (folder /data/shardA1), shard 2 (folder /data/
shardA2)
b) Server B -> shard 2 (folder /data/shardB2), shard 3 (folder /data/
shardB3)
c) Server C -> shard 1 (folder /data/shardC1), shard 3 (folder /data/
shardC3)
If I can, how would I set it up? If not this would be highly
desirable, since I would be able to go back to a backed up shard info
just in case a specific shard gets corrupted.
Thanks for your time.
Regards
Dipamay