We are designing our first elasticsearch deployment. We need a development
'instance', test 'instance' and live 'instance'. I use the word 'instance'
as I'm not sure at this stage whether it should be a node or cluster.
Two essential requirements for us are:
- No cross talk between any 'instance' (ie no queries for the live
'instance' being routed to the test or development) - The ability to synchronise data from the development 'instance' to the
test and then live 'instances' on demand, not automatically. ie, we want to
develop against a particular dataset, push that data to test, conduct
formal testing and then push that data to our live instance if and only if
the tests pass.
We are evaluating two options:
- One cluster, three nodes, shards replicated across nodes:
If we were to have one cluster, with three nodes, I think we could prevent
cross talk using routing (and possibly preference?) but I don't see a way
of preventing elasticsearch from synchronising data from
the configured master node (on development) to the test and live shards if
they are part of the same cluster. I suppose we could remove test and live
from the cluster and then rejoin them to carry out the updates? Is there a
more natural way that we can achieve this 'update on demand'? Am I correct
that we can prevent cross talk between the nodes through configuration?
- Three clusters, each with one node:
It seems that some people
(http://elasticsearch-users.115913.n3.nabble.com/Replicating-data-from-one-ES-cluster-to-another-td4034172.html)
use rsync (http://rsync.samba.org/) to update the underlying Lucene index
files between clusters once an update is complete, so we could have
development, test and live clusters and use rsync to push data between them
as required. I've worked at a relatively low level with Lucene before and
am broadly happy that this approach would work, but can anyone provide any
experience of trying it?
I've searched for an elasticsearch-to-elasticsearch river but have not
been able to find one. If one exists that would obviously be a good option.
Does anyone know of one?
Broadly, I can't beleive that our usage pattern of separated dev -> test ->
live with on demand updates has not been used elsewhere, and that this
problem hasn't been solved before. Does anyone have any experience of
solving this? Have we missed something obvious here?
Thanks
Alex
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.