Geo locating shards question

Just a quick questions:

In the scenario discussed in this RavenDB guidehttp://ravendb.net/docs/server/scaling-out/sharding, data
from different companies from across the globe is to be stored in
differently located shards based on the region of the company. So if
company A is based in Asia and company B is based in the UK, all of company
A's data would be indexed into a shard located in Asia and all of company
B's data would be indexed into a shard located in the UK.

My questions is: is this geo locating of shards possible without creating a
different index for each company in elasticsearch? And if it is, is it
something that would need to be thought about before going live with a
solution? What I mean by that is: if you weren't bothered about
geolocating data when you initially went live, would it be possible to
introduce a solution later when required?

Regards,
James

--

The design decisions behind sharding is quite different between RavenDB and
Elasticsearch. To name one such difference, ES does this on the cluster,
while sharding in RavenDB is powered by the client itself, and the server
is not at all aware it is just a shard.

In Elasticsearch the best approach would probably be to have different
indexes per region and have it sharded based on actual load (probably
reserving some virtual shards). You could then control which nodes it will
be deployed on using the include/exclude
tagshttp://www.elasticsearch.org/guide/reference/index-modules/allocation.html.
Since the cluster is split geographically, I wonder if it makes sense to
still have them in one ES cluster. Only useful if you are going to query
and aggregate results across regions.

One important benefit for creating an index per region will be the ability
to worry about scaling each region independently, instead of having one
huge index, geographically distributed, for everything without the ability
to re-shard.

You always have to properly plan sharding with Elasticsearch, and that goes
for RavenDB as well (where you'd have to create a good sharding function).
In your scenario, you can start with one huge index and reindex to
different geographically distributed indexes at a later time. Reindexing
with ES is easy enough, and may the _source be with you :slight_smile:

On Sun, Jan 27, 2013 at 2:42 PM, james.lewis@7digital.com wrote:

Just a quick questions:

In the scenario discussed in this RavenDB guidehttp://ravendb.net/docs/server/scaling-out/sharding, data
from different companies from across the globe is to be stored in
differently located shards based on the region of the company. So if
company A is based in Asia and company B is based in the UK, all of company
A's data would be indexed into a shard located in Asia and all of company
B's data would be indexed into a shard located in the UK.

My questions is: is this geo locating of shards possible without creating
a different index for each company in elasticsearch? And if it is, is it
something that would need to be thought about before going live with a
solution? What I mean by that is: if you weren't bothered about
geolocating data when you initially went live, would it be possible to
introduce a solution later when required?

Regards,
James

--

That was my initial plan, to start with my one index and then reindex later
on if I need to (I'm using aliasing anyway).

I did just have a thought though - if I were more concerned with first time
to byte on search performance then I wouldn't care which continent I was
indexing to, I would just need to make sure that there was a replica of the
data in the continent the search request was made. So if someone searches
from the US their request would get routed to a US node containing a
replica of that data (but it would have been indexed on a node hosted in
the UK for example).

Thanks a lot for confirming the differences between RavenDB and ES - I was
envisioning a similar sharding function to be required in elasticsearch but
obviously not!

Thanks,
James

On Sun, Jan 27, 2013 at 1:42 PM, Itamar Syn-Hershko itamar@code972.comwrote:

The design decisions behind sharding is quite different between RavenDB
and Elasticsearch. To name one such difference, ES does this on the
cluster, while sharding in RavenDB is powered by the client itself, and the
server is not at all aware it is just a shard.

In Elasticsearch the best approach would probably be to have different
indexes per region and have it sharded based on actual load (probably
reserving some virtual shards). You could then control which nodes it will
be deployed on using the include/exclude tagshttp://www.elasticsearch.org/guide/reference/index-modules/allocation.html.
Since the cluster is split geographically, I wonder if it makes sense to
still have them in one ES cluster. Only useful if you are going to query
and aggregate results across regions.

One important benefit for creating an index per region will be the ability
to worry about scaling each region independently, instead of having one
huge index, geographically distributed, for everything without the ability
to re-shard.

You always have to properly plan sharding with Elasticsearch, and that
goes for RavenDB as well (where you'd have to create a good sharding
function). In your scenario, you can start with one huge index and reindex
to different geographically distributed indexes at a later time. Reindexing
with ES is easy enough, and may the _source be with you :slight_smile:

On Sun, Jan 27, 2013 at 2:42 PM, james.lewis@7digital.com wrote:

Just a quick questions:

In the scenario discussed in this RavenDB guidehttp://ravendb.net/docs/server/scaling-out/sharding, data
from different companies from across the globe is to be stored in
differently located shards based on the region of the company. So if
company A is based in Asia and company B is based in the UK, all of company
A's data would be indexed into a shard located in Asia and all of company
B's data would be indexed into a shard located in the UK.

My questions is: is this geo locating of shards possible without creating
a different index for each company in elasticsearch? And if it is, is it
something that would need to be thought about before going live with a
solution? What I mean by that is: if you weren't bothered about
geolocating data when you initially went live, would it be possible to
introduce a solution later when required?

Regards,
James

--

--