Copying data between clusters

Hello,

We have two clusters we want to copy data between. We want to copy data
from cluster1 (0.18.5) to cluster2 (0.19.1). Is this possible ?

Thanks,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Hi Rafał,

If you want to do so in real time, having both cluster updated together, I
guess you may use this plugin:

Besided that I think you'll have to implement your custom solution as there
is no notification/changes API in ES yet. You may take a look at this post
about this feature (quite long)
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/changes$20API/elasticsearch/S3fSfr4Cz3g/fyse5X4ofuYJ

For only creating a new 0.19 cluster with current 0.18 data, AFAIK you only
need to backup 0.18 data, following the steps mentioned here
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/BtjDXzKdAIk
(disable fllushing, force flush, copy, renable flushing). These steps are
reproduced by a Karussell scritps (Backup ElasticSearch with rsync · GitHub)

Then you use the sopy of your data as the 'data' directory for you 0.19
server. When you start it for first time, it should update the info to the
new format.

Hope it helps (meanwhile you get a response from someone more experienced
on ES at least :slight_smile:

Cheers,

On Friday, 23 March 2012 13:04:48 UTC-3, Rafał Kuć wrote:

Hello,

We have two clusters we want to copy data between. We want to copy data
from cluster1 (0.18.5) to cluster2 (0.19.1). Is this possible ?

Thanks,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Thanks Frederic,

The 0.19.1 is not clear and so I can't just flush the 0.18.5, copy the data
directory between nodes and then start 0.19.1 as I'll loose the data in the
new cluster this way. What I would like to achieve is copy the data from
0.18.5 to 0.19.1.

I was thinking about connecting those two clusters together to form a new
one, increase the replica count to be equal to the number of nodes and wait
for ES finish copying data. In theory it seems doable, but I don't know if
there are any practical obstacles that I should be aware of, like version
incompatibilities.

Regards,
Rafał

You can't connect two clusters 0.18.5 and 0.19.1. You will receive
something in the 0.18.5 logs like

[2012-03-24 ...][WARN ][discovery.zen.ping.multicast] [Spyne] failed to
read requesting node from /...:...
java.io.IOException: Expected handle header, got [-9]
at
org.elasticsearch.common.io.stream.HandlesStreamInput.readUTF(HandlesStreamInput.java:63)
at
org.elasticsearch.cluster.ClusterName.readFrom(ClusterName.java:63)
at
org.elasticsearch.cluster.ClusterName.readClusterName(ClusterName.java:58)
at
org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing$Receiver.run(MulticastZenPing.java:363)
at java.lang.Thread.run(Thread.java:722)

This seems caused by slight changes in the header structure of the
multicast pings.

Jörg

On Saturday, March 24, 2012 12:12:13 AM UTC+1, Rafał Kuć wrote:

Thanks Frederic,

The 0.19.1 is not clear and so I can't just flush the 0.18.5, copy the
data directory between nodes and then start 0.19.1 as I'll loose the data
in the new cluster this way. What I would like to achieve is copy the data
from 0.18.5 to 0.19.1.

I was thinking about connecting those two clusters together to form a new
one, increase the replica count to be equal to the number of nodes and wait
for ES finish copying data. In theory it seems doable, but I don't know if
there are any practical obstacles that I should be aware of, like version
incompatibilities.

Regards,
Rafał

Hi,

That's what I was afraid of. Thanks Jörg.

Rafał

W dniu sobota, 24 marca 2012, 18:42:57 UTC+1 użytkownik Jörg Prante napisał:

You can't connect two clusters 0.18.5 and 0.19.1. You will receive
something in the 0.18.5 logs like

[2012-03-24 ...][WARN ][discovery.zen.ping.​multicast] [Spyne] failed to
read requesting node from /...:...
java.io.IOException: Expected handle header, got [-9]
at
org.elasticsearch.common.io.​stream.HandlesStreamInput.​readUTF(HandlesStreamInput.​java:63)
at
org.elasticsearch.cluster.​ClusterName.readFrom(​ClusterName.java:63)
at
org.elasticsearch.cluster.​ClusterName.readClusterName(​ClusterName.java:58)
at
org.elasticsearch.discovery.​zen.ping.multicast.​MulticastZenPing$Receiver.run(​MulticastZenPing.java:363)
at java.lang.Thread.run(Thread.​java:722)

This seems caused by slight changes in the header structure of the
multicast pings.

Jörg

On Saturday, March 24, 2012 12:12:13 AM UTC+1, Rafał Kuć wrote:

Thanks Frederic,

The 0.19.1 is not clear and so I can't just flush the 0.18.5, copy the
data directory between nodes and then start 0.19.1 as I'll loose the data
in the new cluster this way. What I would like to achieve is copy the data
from 0.18.5 to 0.19.1.

I was thinking about connecting those two clusters together to form a new
one, increase the replica count to be equal to the number of nodes and wait
for ES finish copying data. In theory it seems doable, but I don't know if
there are any practical obstacles that I should be aware of, like version
incompatibilities.

Regards,
Rafał