Replicating one cluster to another cluster

Hey,

Was wondering if anyone has tried replicating one cluster to a new cluster
and keep it in "sync". Example is I have a production cluster and i need
to reindex all data. I would like to do this in a 2nd cluster so I can
compare the changes but if an update happens on the original index I want
it reflected on the replicated one.

I am pretty sure I can whip something with scroll/scan but if someone has
done before and has code to share it would be great.

Thanks
Zuhaib

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9f39a38-53f4-4f23-9aaa-0b270261ebae%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There are a few ways;
stream2es - GitHub - elastic/stream2es: Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
logstash with the elasticsearch and elasticsearch_http outputs -
http://logstash.net/
There is also these two which I haven't used -
GitHub - crate/elasticsearch-inout-plugin: An Elasticsearch plugin which provides the ability to export data by query on server side.
And - GitHub - jprante/elasticsearch-knapsack: Knapsack plugin is an import/export tool for Elasticsearch

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 8 January 2014 11:26, Zuhaib Siddique zsiddique@gmail.com wrote:

Hey,

Was wondering if anyone has tried replicating one cluster to a new cluster
and keep it in "sync". Example is I have a production cluster and i need
to reindex all data. I would like to do this in a 2nd cluster so I can
compare the changes but if an update happens on the original index I want
it reflected on the replicated one.

I am pretty sure I can whip something with scroll/scan but if someone has
done before and has code to share it would be great.

Thanks
Zuhaib

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c9f39a38-53f4-4f23-9aaa-0b270261ebae%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a9XG2jd-%2BO4TvxCJ_v6odn%3DoeG9aiJNiQsPsa8CSYk9g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

First and most important, the good news: ES 1.0.0.Beta2 has
snapshot/restore feature in place so it should be easy to snapshot and
restore the result back to a target cluster. The snapshots are also
incremental.

Second, there are also news for the knapsack plugin.

In the next knapsack plugin version due this week, a full copy from
cluster1 to cluster2 will be as simple as

curl -XPOST 'http://cluster1node:port1
/_export/copy?cluster=cluster2name&host=cluster2node&port=port2'

Limitations will be that you have knapsack plugin installed at
cluster1node, the same JVM version in cluster1 and cluster2, same ES
version in cluster1 and cluster2, and all your indexes have stored fields,
preferably the _source field. Also, cluster1 must not modify the indexes
while the _export/copy is running, or cluster2 may have different data
(there is no inherent locking).

In the new knapsack export version, you will be able to use arbitrary ES
queries to select subsets of the cluster data to copy, so only the hits of
a query can be transferred.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHn%2BLA-BHeZTzxr6C2w4g7ULqWLHpr6gw6zstWptmDt4g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

A colleague of mine here at TaskRabbit whipped-up a node.js-based tool
similar to these:

His main use case is to replicate ES cluster from our production system to
a staging/test environment. I believe it has the same requirements as
other similar tools, mainly that the source index needs to have the
original documents stored in the _source field.

Feedback and criticism is welcome.

Aaron

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a7601f6-8484-40c2-9576-9e6c19dcc7d3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.