How to get a local ES to do an async copy over (replicate-over) of Prod ES indices?

pulkitsinghal · March 15, 2012, 4:42pm

I know that it is possible to start the index without any replicas, and add
them at a later point:
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
But doing so in the simplest manner would only cause the replicated copy to
exist on the cloud where my production ES instance lives.

Can anyone tell me how to start a local ES instance and perform an async
replication of the data from the cloud over to my machine? If replication
is not the right term then lets call it as data dump or whatever else fits
the bill here. Any thoughts?

gearond · March 15, 2012, 5:21pm

Want to know this also. Posting to watch replies.

pulkitsinghal · March 15, 2012, 5:39pm

The following thread doesn't really answer my question (may be because I
don't get how to set it up):
http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production cluster,
and move it to development" makes sense. I'll try that out but in the mean
time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

pulkitsinghal · March 15, 2012, 5:57pm

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

pulkitsinghal · March 16, 2012, 12:58pm

So since its tougher to answer a question when something actually cannot
happen, I take it that there is no way to start a local ES instance and
perform an async replication of the data from the cloud over to my machine?

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Berkay_Mollamustafao · March 16, 2012, 1:05pm

Right, there is no built-in mechanism to do that. You'd have to write code
that reads from one and writes to the other yourself.

Copying the files should have worked, not sure what that problem is. There
has been some discussions about this in the list. If you search for
"backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

So since its tougher to answer a question when something actually cannot
happen, I take it that there is no way to start a local ES instance and
perform an async replication of the data from the cloud over to my machine?

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into
problems early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal <pulkitsinghal@gmail.com

wrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Clinton_Gormley · March 16, 2012, 2:07pm

On Fri, 2012-03-16 at 09:05 -0400, Berkay Mollamustafaoglu wrote:

Right, there is no built-in mechanism to do that. You'd have to write
code that reads from one and writes to the other yourself.

And here's some code:

http://blogs.perl.org/users/clinton_gormley/2011/04/elasticsearchpm-v036-now-with-extra-sugar.html

clint

Copying the files should have worked, not sure what that problem is.
There has been some discussions about this in the list. If you search
for "backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal
pulkitsinghal@gmail.com wrote:
So since its tougher to answer a question when something
actually cannot happen, I take it that there is no way to
start a local ES instance and perform an async replication of
the data from the cloud over to my machine?

    On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal
    <pulkitsinghal@gmail.com> wrote:
            Unfortunately the SCP command to get the data
            directory runs into problems early on:
            
            ====
            Sending file modes: C0664 51257065 _5kr.fdt
            Sink: C0664 51257065 _5kr.fdt
            _5kr.fdt     46%   23MB   0.0KB/s - stalled -
            ====
            
            So I guess that's not a good route to go.
            
            
            On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal
            <pulkitsinghal@gmail.com> wrote:
                    The following thread doesn't really answer my
                    question (may be because I don't get how to
                    set it up):
                    http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html
                    
                    But Shay's comment, "copy of the data
                    directory of each production cluster, and move
                    it to development" makes sense. I'll try that
                    out but in the mean time any other direct
                    answers to my question would be most welcome.
                    
                    
                    On Thu, Mar 15, 2012 at 12:21 PM, gearond
                    <gearond@sbcglobal.net> wrote:
                            Want to know this also. Posting to
                            watch replies.
                            
                            --
                            View this message in context:
                            http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
                            Sent from the ElasticSearch Users
                            mailing list archive at Nabble.com.

pulkitsinghal · March 16, 2012, 3:48pm

Better way to make the transfer so that it can be picked up where its left
off is to:

a) zip up the data directory:
tar -zcvf data.tar.gz /opt/elasticsearch/data

b) use rync command that will let you pick up where you left off in case
something messes up:
rsync --rsh='ssh -i /users/xxx/.ec2/ec2.pem' --partial --progress
ec2-user@XXX.XXX.XXX.XXX:/opt/elasticsearch/data.tar.gz ~/dev/elasticsearch/

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into problems
early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

pulkitsinghal · March 16, 2012, 3:52pm

That's truly awesome Clint!
For me its unfortunate that I'm not using Perl (therefore my working
knowledge gas atrophied) and this is also why I had trouble making the best
of the "terms of endearment" slides. Not sure if I can learn enough to
simply run with this but I'll try. Thansk for the great work.

On Fri, Mar 16, 2012 at 9:07 AM, Clinton Gormley clint@traveljury.comwrote:

On Fri, 2012-03-16 at 09:05 -0400, Berkay Mollamustafaoglu wrote:

Right, there is no built-in mechanism to do that. You'd have to write
code that reads from one and writes to the other yourself.

And here's some code:

ElasticSearch.pm v0.36, now with extra sugar | Clinton Gormley [blogs.perl.org]

clint
Copying the files should have worked, not sure what that problem is.
There has been some discussions about this in the list. If you search
for "backup" in mailing list archive you should be able to find the
discussions.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Mar 16, 2012 at 8:58 AM, Pulkit Singhal
pulkitsinghal@gmail.com wrote:
So since its tougher to answer a question when something
actually cannot happen, I take it that there is no way to
start a local ES instance and perform an async replication of
the data from the cloud over to my machine?
    On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal
    <pulkitsinghal@gmail.com> wrote:
            Unfortunately the SCP command to get the data
            directory runs into problems early on:

            ====
            Sending file modes: C0664 51257065 _5kr.fdt
            Sink: C0664 51257065 _5kr.fdt
            _5kr.fdt     46%   23MB   0.0KB/s - stalled -
            ====

            So I guess that's not a good route to go.


            On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal
            <pulkitsinghal@gmail.com> wrote:
                    The following thread doesn't really answer my
                    question (may be because I don't get how to
                    set it up):
http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html
                    But Shay's comment, "copy of the data
                    directory of each production cluster, and move
                    it to development" makes sense. I'll try that
                    out but in the mean time any other direct
                    answers to my question would be most welcome.


                    On Thu, Mar 15, 2012 at 12:21 PM, gearond
                    <gearond@sbcglobal.net> wrote:
                            Want to know this also. Posting to
                            watch replies.

                            --
                            View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
                            Sent from the ElasticSearch Users
                            mailing list archive at Nabble.com.

kimchy · March 17, 2012, 10:24am

Yea, copying over the data location to your local environment should work,
not sure why it failed. Usually, its recommended that you disable flush
when doing so.

Another option, which clinton provided an example for, is to use scroll
search and reindex the data (or a portion of it, based on the query).

On Fri, Mar 16, 2012 at 5:48 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Better way to make the transfer so that it can be picked up where its left
off is to:

a) zip up the data directory:
tar -zcvf data.tar.gz /opt/elasticsearch/data

b) use rync command that will let you pick up where you left off in case
something messes up:
rsync --rsh='ssh -i /users/xxx/.ec2/ec2.pem' --partial --progress
ec2-user@XXX.XXX.XXX.XXX:/opt/elasticsearch/data.tar.gz
~/dev/elasticsearch/

On Thu, Mar 15, 2012 at 12:57 PM, Pulkit Singhal pulkitsinghal@gmail.comwrote:

Unfortunately the SCP command to get the data directory runs into
problems early on:

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

So I guess that's not a good route to go.

On Thu, Mar 15, 2012 at 12:39 PM, Pulkit Singhal <pulkitsinghal@gmail.com

wrote:

The following thread doesn't really answer my question (may be because I
don't get how to set it up):

http://elasticsearch-users.115913.n3.nabble.com/how-to-dump-the-entire-contents-of-ES-td2758234.html

But Shay's comment, "copy of the data directory of each production
cluster, and move it to development" makes sense. I'll try that out but in
the mean time any other direct answers to my question would be most welcome.

On Thu, Mar 15, 2012 at 12:21 PM, gearond gearond@sbcglobal.net wrote:

Want to know this also. Posting to watch replies.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/How-to-get-a-local-ES-to-do-an-async-copy-over-replicate-over-of-Prod-ES-indices-tp3829318p3829442.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Topic		Replies	Views
Backing Up ES Elasticsearch	9	453	July 6, 2017
Regarding data copy Elasticsearch	14	1624	July 6, 2017
Problems manually copying a set of indices from a 3-node cluster to a 1-node cluster Elasticsearch	11	484	July 6, 2017
AWS S3 snapshot copy Elasticsearch	4	689	July 6, 2017
Copy index from production to development instance Elasticsearch	5	1055	July 6, 2017

How to get a local ES to do an async copy over (replicate-over) of Prod ES indices?

==== Sending file modes: C0664 51257065 _5kr.fdt Sink: C0664 51257065 _5kr.fdt _5kr.fdt 46% 23MB 0.0KB/s - stalled -

==== Sending file modes: C0664 51257065 _5kr.fdt Sink: C0664 51257065 _5kr.fdt _5kr.fdt 46% 23MB 0.0KB/s - stalled -

==== Sending file modes: C0664 51257065 _5kr.fdt Sink: C0664 51257065 _5kr.fdt _5kr.fdt 46% 23MB 0.0KB/s - stalled -

==== Sending file modes: C0664 51257065 _5kr.fdt Sink: C0664 51257065 _5kr.fdt _5kr.fdt 46% 23MB 0.0KB/s - stalled -

==== Sending file modes: C0664 51257065 _5kr.fdt Sink: C0664 51257065 _5kr.fdt _5kr.fdt 46% 23MB 0.0KB/s - stalled -

Related topics

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -

====
Sending file modes: C0664 51257065 _5kr.fdt
Sink: C0664 51257065 _5kr.fdt
_5kr.fdt 46% 23MB 0.0KB/s - stalled -