Architecture and scalability across remote groups

Hi,

I have a task to distribute indexes across remote groups. Each group has
one node and should work with their own indices. + it would be good if one
group can execute search across other groups indexes.
So, I have some questions (see attached image).

For option 1: (Central Index, same index across all remote groups)

  1. Can I dynamically increase the number of shards per group, let's say one
    of it will have more users?
  2. If one day I have new group, can I equally reallocate other groups
    replicas? I have to update config for each group?
  3. When I do search, can I use routing+aliases to specify group (local or
    remote groups) to search across all shards located within that group?
  4. To search across all index one out of three groups can be down, right?

For option 2:

  1. How stupid is it to have nodes == replicas?

Thanks.

--

Hi Benis,

On Thu, Jan 24, 2013 at 7:41 PM, Benis Dystrov dbystrov@gmail.com wrote:

Hi,

I have a task to distribute indexes across remote groups. Each group has
one node and should work with their own indices.

Take a look at Index Shard Allocation:

And Shard Allocation Awareness:

There are a few options there, I'm not sure I fully understand your
use-case to suggest one particular option.

  • it would be good if one group can execute search across other groups
    indexes.
    So, I have some questions (see attached image).

For option 1: (Central Index, same index across all remote groups)

  1. Can I dynamically increase the number of shards per group, let's say
    one of it will have more users?

You can't dynamically increase the number of shards per index. But you can
increase the number of indices that are stored in a single group, which
effectively should do what you need.

  1. If one day I have new group, can I equally reallocate other groups
    replicas?

By default, ES tries to allocate an equal number shards so that your nodes
are evenly loaded. It will also take your Allocation settings (as suggested
above) into account. So ultimately you can use the Cluster Reroute API to
allocate shards manually:

I have to update config for each group?

It's all done via the API, so that shouldn't be a problem.

  1. When I do search, can I use routing+aliases to specify group (local or
    remote groups) to search across all shards located within that group?

You can do that. Also, ff a group is a single node, then you can specify
that node in your Search Preferences:

  1. To search across all index one out of three groups can be down, right?

You should be able to do it using the same trick. Or, if you use
routing/aliases.

For option 2:

  1. How stupid is it to have nodes == replicas?

It seems appropriate if you want to keep the number of nodes down and at
the same time you want to avoid traffic between nodes. Then if you have a
complete set of data on each node, you can just execute the search on that
node, using Search Preference.

The obvious downside is that the complete set of data will need to "fit" in
one node.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

as workgroups are located remotely (let's say NY, LA, MI, WA)

  1. So, if I have remote workgroups, is it better to have their own indexes
    than one central?
  2. And if I have one central index, will I get the answer that 2 of 10
    workgroups are not answering (offline)?
  3. Can I control the latency of getting search results from remote indexes ?

Thanks for helping.

On Thursday, January 24, 2013 12:41:03 PM UTC-5, Benis Dystrov wrote:

Hi,

I have a task to distribute indexes across remote groups. Each group has
one node and should work with their own indices. + it would be good if one
group can execute search across other groups indexes.
So, I have some questions (see attached image).

For option 1: (Central Index, same index across all remote groups)

  1. Can I dynamically increase the number of shards per group, let's say
    one of it will have more users?
  2. If one day I have new group, can I equally reallocate other groups
    replicas? I have to update config for each group?
  3. When I do search, can I use routing+aliases to specify group (local or
    remote groups) to search across all shards located within that group?
  4. To search across all index one out of three groups can be down, right?

For option 2:

  1. How stupid is it to have nodes == replicas?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

On Tue, Jan 29, 2013 at 2:40 PM, Benis Dystrov dbystrov@gmail.com wrote:

as workgroups are located remotely (let's say NY, LA, MI, WA)

  1. So, if I have remote workgroups, is it better to have their own indexes
    than one central?

I think it's easier to manage if locations have their own indices, so you
can easily search within the same location. And you can still search on all
locations by directing your queries to multiple indices.

  1. And if I have one central index, will I get the answer that 2 of 10
    workgroups are not answering (offline)?

Yes. If there's still a complete set of data available (via replicas), you
should still get the query results. If only some shards are available,
you'll see that in the query results. Something like:

"_shards":{"total":5,"successful":3,"failed":2}

  1. Can I control the latency of getting search results from remote indexes
    ?

You can specify a timeout for the whole search:

And you can also control how big of a timeout you can have before a node is
"kicked out" of the cluster, through discovery.zen.fd.ping_timeout:

But I wouldn't set that value too low, otherwise nodes will probably go on
an off the cluster all the time. Which creates a lot of load due to
rebalancing.

Best regards,
Radu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you very much. I've got everything I need.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.