Deploy develpment, test and live instances

We are designing our first elasticsearch deployment. We need a development
'instance', test 'instance' and live 'instance'. I use the word 'instance'
as I'm not sure at this stage whether it should be a node or cluster.

Two essential requirements for us are:

  1. No cross talk between any 'instance' (ie no queries for the live
    'instance' being routed to the test or development)
  2. The ability to synchronise data from the development 'instance' to the
    test and then live 'instances' on demand, not automatically. ie, we want to
    develop against a particular dataset, push that data to test, conduct
    formal testing and then push that data to our live instance if and only if
    the tests pass.

We are evaluating two options:

  1. One cluster, three nodes, shards replicated across nodes:

If we were to have one cluster, with three nodes, I think we could prevent
cross talk using routing (and possibly preference?) but I don't see a way
of preventing elasticsearch from synchronising data from
the configured master node (on development) to the test and live shards if
they are part of the same cluster. I suppose we could remove test and live
from the cluster and then rejoin them to carry out the updates? Is there a
more natural way that we can achieve this 'update on demand'? Am I correct
that we can prevent cross talk between the nodes through configuration?

  1. Three clusters, each with one node:

It seems that some people
(http://elasticsearch-users.115913.n3.nabble.com/Replicating-data-from-one-ES-cluster-to-another-td4034172.html)
use rsync (http://rsync.samba.org/) to update the underlying Lucene index
files between clusters once an update is complete, so we could have
development, test and live clusters and use rsync to push data between them
as required. I've worked at a relatively low level with Lucene before and
am broadly happy that this approach would work, but can anyone provide any
experience of trying it?

I've searched for an elasticsearch-to-elasticsearch river but have not
been able to find one. If one exists that would obviously be a good option.
Does anyone know of one?

Broadly, I can't beleive that our usage pattern of separated dev -> test ->
live with on demand updates has not been used elsewhere, and that this
problem hasn't been solved before. Does anyone have any experience of
solving this? Have we missed something obvious here?

Thanks

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Alex,

I think you could have both:

  1. Create one index per platform (I mean dev, test, live)
  2. Have 3 clusters.

For the 1st one, the main problem I see if when it comes to upgrading a cluster (elasticsearch version).
I'm not sure you want to test it in production, right?

For the 2nd one, which is the one I prefer, I'd only set one node for dev, one or 3 for test, 3 (at least) for production.
Unless you don't care of having downtime.

ES River was something I had in mind a long time ago: [Feature Request] Add a river to ElasticSearch instance · Issue #1077 · elastic/elasticsearch · GitHub
But it was not really making sense.

We are currently working on Snapshot & restore features which could help you a lot to do what you are looking for.
That said, it will only work if you have exactly same data on all platforms.

Does it help or did I miss something?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 11 sept. 2013 à 13:32, Alexander Mitchell alexanderjmitchell@gmail.com a écrit :

We are designing our first elasticsearch deployment. We need a development 'instance', test 'instance' and live 'instance'. I use the word 'instance' as I'm not sure at this stage whether it should be a node or cluster.

Two essential requirements for us are:

  1. No cross talk between any 'instance' (ie no queries for the live 'instance' being routed to the test or development)
  2. The ability to synchronise data from the development 'instance' to the test and then live 'instances' on demand, not automatically. ie, we want to develop against a particular dataset, push that data to test, conduct formal testing and then push that data to our live instance if and only if the tests pass.

We are evaluating two options:

  1. One cluster, three nodes, shards replicated across nodes:

If we were to have one cluster, with three nodes, I think we could prevent cross talk using routing (and possibly preference?) but I don't see a way of preventing elasticsearch from synchronising data from the configured master node (on development) to the test and live shards if they are part of the same cluster. I suppose we could remove test and live from the cluster and then rejoin them to carry out the updates? Is there a more natural way that we can achieve this 'update on demand'? Am I correct that we can prevent cross talk between the nodes through configuration?

  1. Three clusters, each with one node:

It seems that some people (http://elasticsearch-users.115913.n3.nabble.com/Replicating-data-from-one-ES-cluster-to-another-td4034172.html) use rsync (http://rsync.samba.org/) to update the underlying Lucene index files between clusters once an update is complete, so we could have development, test and live clusters and use rsync to push data between them as required. I've worked at a relatively low level with Lucene before and am broadly happy that this approach would work, but can anyone provide any experience of trying it?

I've searched for an elasticsearch-to-elasticsearch river but have not been able to find one. If one exists that would obviously be a good option. Does anyone know of one?

Broadly, I can't beleive that our usage pattern of separated dev -> test -> live with on demand updates has not been used elsewhere, and that this problem hasn't been solved before. Does anyone have any experience of solving this? Have we missed something obvious here?

Thanks

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David

Thanks for your response.

I'm not considering the need for replication within an 'instance' at the
moment as failover is not a high priority requirement.

Regarding option 1, is there a natural way that we can achieve the 'update
on demand' that we need? Am I correct that we can prevent cross talk
between the nodes through configuration? Do you have any experience of
rsync?

Thanks again

Alex

On Wednesday, 11 September 2013 13:53:42 UTC+1, David Pilato wrote:

Hi Alex,

I think you could have both:

  1. Create one index per platform (I mean dev, test, live)
  2. Have 3 clusters.

For the 1st one, the main problem I see if when it comes to upgrading a
cluster (elasticsearch version).
I'm not sure you want to test it in production, right?

For the 2nd one, which is the one I prefer, I'd only set one node for dev,
one or 3 for test, 3 (at least) for production.
Unless you don't care of having downtime.

ES River was something I had in mind a long time ago:
[Feature Request] Add a river to ElasticSearch instance · Issue #1077 · elastic/elasticsearch · GitHub
But it was not really making sense.

We are currently working on Snapshot & restore features which could help
you a lot to do what you are looking for.
That said, it will only work if you have exactly same data on all
platforms.

Does it help or did I miss something?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 11 sept. 2013 à 13:32, Alexander Mitchell <alexander...@gmail.com<javascript:>>
a écrit :

We are designing our first elasticsearch deployment. We need a development
'instance', test 'instance' and live 'instance'. I use the word 'instance'
as I'm not sure at this stage whether it should be a node or cluster.

Two essential requirements for us are:

  1. No cross talk between any 'instance' (ie no queries for the live
    'instance' being routed to the test or development)
  2. The ability to synchronise data from the development 'instance' to the
    test and then live 'instances' on demand, not automatically. ie, we want to
    develop against a particular dataset, push that data to test, conduct
    formal testing and then push that data to our live instance if and only if
    the tests pass.

We are evaluating two options:

  1. One cluster, three nodes, shards replicated across nodes:

If we were to have one cluster, with three nodes, I think we could prevent
cross talk using routing (and possibly preference?) but I don't see a way
of preventing elasticsearch from synchronising data from
the configured master node (on development) to the test and live shards if
they are part of the same cluster. I suppose we could remove test and live
from the cluster and then rejoin them to carry out the updates? Is there a
more natural way that we can achieve this 'update on demand'? Am I correct
that we can prevent cross talk between the nodes through configuration?

  1. Three clusters, each with one node:

It seems that some people (
http://elasticsearch-users.115913.n3.nabble.com/Replicating-data-from-one-ES-cluster-to-another-td4034172.html)
use rsync (http://rsync.samba.org/) to update the underlying Lucene index
files between clusters once an update is complete, so we could have
development, test and live clusters and use rsync to push data between them
as required. I've worked at a relatively low level with Lucene before and
am broadly happy that this approach would work, but can anyone provide any
experience of trying it?

I've searched for an elasticsearch-to-elasticsearch river but have not
been able to find one. If one exists that would obviously be a good option.
Does anyone know of one?

Broadly, I can't beleive that our usage pattern of separated dev -> test
-> live with on demand updates has not been used elsewhere, and that this
problem hasn't been solved before. Does anyone have any experience of
solving this? Have we missed something obvious here?

Thanks

Alex

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.