Data Locality and Sharded Indexes


(Tony Chong) #1

Hello,

I read in one of the posts that splitting up the index into more shards,
generally improves indexing performance. Is this because all shards of the
index are read at the same time simultaneously when doing a search?

#2, So if I have 4 nodes (node1, 2 3 and 4) and 1 index with 4 shards, 0
replication, if I do a search against node1 with data that resides on node
4, how will this work? or does the shard on a box, only contain the data it
has locally.

Sorry if this doesn't make sense. New to elasticsearch.

Tony


(Shay Banon) #2

If you have 4 nodes, and 4 shards no replicas, then you will have a shard
allocated per node. When you execute a search, it will be broadcast to all
shards, execute on each one ("locally"), and then aggregate the results.

Several shards improve indexing performance if you have enough machines to
have the shards allocated on them. This is simply because indexing
operation, different than search, just goes to one shard, so the indexing
process will end up split across more nodes.

Note, with routing, you can actually control even for search on which shards
a search will execute on.

On Mon, Aug 8, 2011 at 11:47 PM, Tony Chong tonyjchong@gmail.com wrote:

Hello,

I read in one of the posts that splitting up the index into more shards,
generally improves indexing performance. Is this because all shards of the
index are read at the same time simultaneously when doing a search?

#2, So if I have 4 nodes (node1, 2 3 and 4) and 1 index with 4 shards, 0
replication, if I do a search against node1 with data that resides on node
4, how will this work? or does the shard on a box, only contain the data it
has locally.

Sorry if this doesn't make sense. New to elasticsearch.

Tony


(Tony Chong) #3

Thanks Shay. Exactly the information I was looking for!


(system) #4