"Primary" VS "secondary" nodes

Hi,

I was wondering if it was possible to have a cluster on which all nodes share data, but only a select few actually run search queries?

I would like to do that for the following reason : for weird legacy reasons, we currently have our app servers in one data center (let's call it A), and our ES cluster in another (let's call it B). We can't move any of those servers, for many practical reasons (notably that A does not allow for big enough servers for ES, RAM-wise). But it's a real problem since any connectivity problem between the 2 data centers means downtime for us.

So I was thinking of having smaller ES servers in A, that would only keep up with the data updates when running normally, and let the nodes in B handle the actual search queries. Then if the connection to B is lost, the nodes in A would be able to provide some (admittedly degraded) service still.

Thanks for telling what you think!

Jean

I can think of a couple of ways to make sure you only query machines in one
datacenter like
using certain IP ranges or a TransportClient etc. But what I really wonder
if in your case some kind of
message queue would work best rather than spanning a cluster over 2
datacenters with flaky connection?
Eventually I think you'd have less trouble with this.

simon

On Monday, August 19, 2013 11:32:38 AM UTC+2, Jean Rougé wrote:

Hi,

I was wondering if it was possible to have a cluster on which all nodes
share data, but only a select few actually run search queries?

I would like to do that for the following reason : for weird legacy
reasons,
we currently have our app servers in one data center (let's call it A),
and
our ES cluster in another (let's call it B). We can't move any of those
servers, for many practical reasons (notably that A does not allow for big
enough servers for ES, RAM-wise). But it's a real problem since any
connectivity problem between the 2 data centers means downtime for us.

So I was thinking of having smaller ES servers in A, that would only keep
up
with the data updates when running normally, and let the nodes in B handle
the actual search queries. Then if the connection to B is lost, the nodes
in
A would be able to provide some (admittedly degraded) service still.

Thanks for telling what you think!

Jean

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Primary-VS-secondary-nodes-tp4039802.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Simon,

Thanks a lot for your answer. However, I'm not sure I get what you mean.

First, by "querying certain IP ranges" : if you query one node in a cluster, that doesn't mean that node is going to handle your query on its own, and rather it's going to make other nodes take part in computing the result, right? Or did I completely miss your point?

As for your TransportClient suggestion, I don't quite see how to use that to make sure the nodes in B do keep track of the data updates made during normal operation. Could you please elaborate a little more?

Finally, the message queue : do you mean having 2 separate clusters, queuing in a message queue service all the update operations made, and then play them on both clusters? I had thought of that, but was hoping there was a simpler/more vanilla solution. It's admittedly pretty simple to code, but I try to keep our architecture as simple as possible, it's already quite a mess...

Looking forward to your answer, I really appreciate your help!

Cheers,

Jean

sorry for not being verbose enough here. I really missed what I wanted to
say anyways. My IP based / TransportCilent solution would involve 2
indices.
and one index is restricted to nodes in only one DC. That way you can have
the data locally but maintain a single cluster. All you need to do is to
index it twice.
And then make sure you don't talk to machines in the one DC to make calls
locally.

is this more clear? It's hackish but could work?

simon

On Tuesday, August 20, 2013 12:04:44 PM UTC+2, Jean Rougé wrote:

Hi Simon,

Thanks a lot for your answer. However, I'm not sure I get what you mean.

First, by "querying certain IP ranges" : if you query one node in a
cluster,
that doesn't mean that node is going to handle your query on its own, and
rather it's going to make other nodes take part in computing the result,
right? Or did I completely miss your point?

As for your TransportClient suggestion, I don't quite see how to use that
to
make sure the nodes in B do keep track of the data updates made during
normal operation. Could you please elaborate a little more?

Finally, the message queue : do you mean having 2 separate clusters,
queuing
in a message queue service all the update operations made, and then play
them on both clusters? I had thought of that, but was hoping there was a
simpler/more vanilla solution. It's admittedly pretty simple to code, but
I
try to keep our architecture as simple as possible, it's already quite a
mess...

Looking forward to your answer, I really appreciate your help!

Cheers,

Jean

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Primary-VS-secondary-nodes-tp4039802p4039860.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.