How to get a local ES to do an async copy over (replicate-over) of Prod ES indices?

I know that it is possible to start the index without any replicas, and add
them at a later point:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
But doing so in the simplest manner would only cause the replicated copy to
exist on the cloud where my production ES instance lives.

Can anyone tell me how to start a local ES instance and perform an async
replication of the data from the cloud over to my machine? If replication
is not the right term then lets call it as data dump or whatever else fits
the bill here. Any thoughts?

Want to know this also. Posting to watch replies.

The following thread doesn't really answer my question (may be because I
don't get how to set it up):
http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production cluster,
and move it to development" makes sense. I'll try that out but in the mean
time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

So since its tougher to answer a question when something actually cannot
happen, I take it that there is no way to start a local ES instance and
perform an async replication of the data from the cloud over to my machine?

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Right, there is no built-in mechanism to do that. You'd have to write code
that reads from one and writes to the other yourself.

Copying the files should have worked, not sure what that problem is. There
has been some discussions about this in the list. If you search for
"backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

So since its tougher to answer a question when something actually cannot
happen, I take it that there is no way to start a local ES instance and
perform an async replication of the data from the cloud over to my machine?

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into
problems early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal <pulkitsinghal@gmail.com

wrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

On Fri, 2012-03-16 at 09:05 -0400, Berkay Mollamustafaoglu wrote:

Right, there is no built-in mechanism to do that. You'd have to write
code that reads from one and writes to the other yourself.

And here's some code:

http://blogs.perl.org/users/clinton_gormley/2011/04/elasticsearchpm-v036-now-with-extra-sugar.html

clint

Copying the files should have worked, not sure what that problem is.
There has been some discussions about this in the list. If you search
for "backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal
pulkitsinghal@gmail.com wrote:
So since its tougher to answer a question when something
actually cannot happen, I take it that there is no way to
start a local ES instance and perform an async replication of
the data from the cloud over to my machine?

    On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal
    <pulkitsinghal@gmail.com> wrote:
            Unfortunately the SCP command to get the data
            directory runs into problems early on:
            
            ====
            Sending file modes: C0664 51257065 _5kr.fdt
            Sink: C0664 51257065 _5kr.fdt
            _5kr.fdt     46%   23MB   0.0KB/s - stalled -
            ====
            
            So I guess that's not a good route to go.
            
            
            On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal
            <pulkitsinghal@gmail.com> wrote:
                    The following thread doesn't really answer my
                    question (may be because I don't get how to
                    set it up):
                    http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html
                    
                    But Shay's comment, "copy of the data
                    directory of each production cluster, and move
                    it to development" makes sense. I'll try that
                    out but in the mean time any other direct
                    answers to my question would be most welcome.
                    
                    
                    On Thu, Mar 15, 2012 at 12:21 PM, gearond
                    <gearond@sbcglobal.net> wrote:
                            Want to know this also. Posting to
                            watch replies.
                            
                            --
                            View this message in context:
                            http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
                            Sent from the ElasticSearch Users
                            mailing list archive at Nabble.com.

Better way to make the transfer so that it can be picked up where its left
off is to:

a) zip up the data directory:
tar -zcvf data.tar.gz /opt/elasticsearch/data

b) use rync command that will let you pick up where you left off in case
something messes up:
rsync --rsh='ssh -i /users/xxx/.ec2/ec2.pem' --partial --progress
ec2-user@XXX.XXX.XXX.XXX:/opt/elasticsearch/data.tar.gz ~/dev/elasticsearch/

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

That's truly awesome Clint!
For me its unfortunate that I'm not using Perl (therefore my working
knowledge gas atrophied) and this is also why I had trouble making the best
of the "terms of endearment" slides. Not sure if I can learn enough to
simply run with this but I'll try. Thansk for the great work.

On Fri, Mar 16, 2012 at 9:07 AM, Clinton Gormley clint@traveljury.comwrote:

On Fri, 2012-03-16 at 09:05 -0400, Berkay Mollamustafaoglu wrote:

Right, there is no built-in mechanism to do that. You'd have to write
code that reads from one and writes to the other yourself.

And here's some code:

ElasticSearch.pm v0.36, now with extra sugar | Clinton Gormley [blogs.perl.org]

clint

Copying the files should have worked, not sure what that problem is.
There has been some discussions about this in the list. If you search
for "backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal
pulkitsinghal@gmail.com wrote:
So since its tougher to answer a question when something
actually cannot happen, I take it that there is no way to
start a local ES instance and perform an async replication of
the data from the cloud over to my machine?

    On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal
    <pulkitsinghal@gmail.com> wrote:
            Unfortunately the SCP command to get the data
            directory runs into problems early on:

            ====
            Sending file modes: C0664 51257065 _5kr.fdt
            Sink: C0664 51257065 _5kr.fdt
            _5kr.fdt     46%   23MB   0.0KB/s - stalled -
            ====

            So I guess that's not a good route to go.


            On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal
            <pulkitsinghal@gmail.com> wrote:
                    The following thread doesn't really answer my
                    question (may be because I don't get how to
                    set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

                    But Shay's comment, "copy of the data
                    directory of each production cluster, and move
                    it to development" makes sense. I'll try that
                    out but in the mean time any other direct
                    answers to my question would be most welcome.


                    On Thu, Mar 15, 2012 at 12:21 PM, gearond
                    <gearond@sbcglobal.net> wrote:
                            Want to know this also. Posting to
                            watch replies.

                            --
                            View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html

                            Sent from the ElasticSearch Users
                            mailing list archive at Nabble.com.

Yea, copying over the data location to your local environment should work,
not sure why it failed. Usually, its recommended that you disable flush
when doing so.

Another option, which clinton provided an example for, is to use scroll
search and reindex the data (or a portion of it, based on the query).

On Fri, Mar 16, 2012 at 5:48 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Better way to make the transfer so that it can be picked up where its left
off is to:

a) zip up the data directory:
tar -zcvf data.tar.gz /opt/elasticsearch/data

b) use rync command that will let you pick up where you left off in case
something messes up:
rsync --rsh='ssh -i /users/xxx/.ec2/ec2.pem' --partial --progress
ec2-user@XXX.XXX.XXX.XXX:/opt/elasticsearch/data.tar.gz
~/dev/elasticsearch/

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into
problems early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal <pulkitsinghal@gmail.com

wrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.