We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers grantr@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
Great! Is it also possible to copy only certain indices? We would like
to be able to copy only the metadata directory and index directories
for the indices we care about.
I guess the question is, can the gateway recovery handle missing
indices in the gateway snapshot?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers gra...@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
A simpler solution is to take the metadata-XXX file (its a json file) and
remove to indices you don't want to use. It can also be hacked for example,
to change the number of replicas and start the cluster again (though there
will be an API for that).
-shay.banon
On Wed, Aug 4, 2010 at 10:56 PM, Grant Rodgers grantr@gmail.com wrote:
Great! Is it also possible to copy only certain indices? We would like
to be able to copy only the metadata directory and index directories
for the indices we care about.
I guess the question is, can the gateway recovery handle missing
indices in the gateway snapshot?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers gra...@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
A simpler solution is to take the metadata-XXX file (its a json file) and
remove to indices you don't want to use. It can also be hacked for example,
to change the number of replicas and start the cluster again (though there
will be an API for that).
-shay.banon
On Wed, Aug 4, 2010 at 10:56 PM, Grant Rodgers grantr@gmail.com wrote:
Great! Is it also possible to copy only certain indices? We would like
to be able to copy only the metadata directory and index directories
for the indices we care about.
I guess the question is, can the gateway recovery handle missing
indices in the gateway snapshot?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers gra...@gmail.com
wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
do you think you can elaborate more on this? I am surprised this is
possible. My naive understanding is that if the snapshotting is going on
then some files in gateway are changed, how it is then possible that the
copy is consistent?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers grantr@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
It basically boils down to how lucene works when storing the index, and the
additional md5 checksum files elasticsearch produces for them. Basically, an
index "version" is written to the gateway while another one exists, and only
when its done being written to the gateway, then the "old" one is removed.
The transaction log is an append only log, and keyed by the index version,
so you just copy it over, and how many operations managed to get into it,
you will get when you recovery.
do you think you can elaborate more on this? I am surprised this is
possible. My naive understanding is that if the snapshotting is going on
then some files in gateway are changed, how it is then possible that the
copy is consistent?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers grantr@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
This is yet another example of the consistently elegant and impressive
architecture of elasticsearch. Keep up the good work Shay. I for one
am amazed by the rapid pace of development!
It basically boils down to how lucene works when storing the index, and the
additional md5 checksum files elasticsearch produces for them. Basically, an
index "version" is written to the gateway while another one exists, and only
when its done being written to the gateway, then the "old" one is removed.
The transaction log is an append only log, and keyed by the index version,
so you just copy it over, and how many operations managed to get into it,
you will get when you recovery.
do you think you can elaborate more on this? I am surprised this is
possible. My naive understanding is that if the snapshotting is going on
then some files in gateway are changed, how it is then possible that the
copy is consistent?
Yes, you can safely copy over the gateway data, either to another s3 or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers gra...@gmail.com wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
On Thu, Aug 5, 2010 at 3:03 AM, Grant Rodgers grantr@gmail.com wrote:
This is yet another example of the consistently elegant and impressive
architecture of elasticsearch. Keep up the good work Shay. I for one
am amazed by the rapid pace of development!
It basically boils down to how lucene works when storing the index, and
the
additional md5 checksum files elasticsearch produces for them. Basically,
an
index "version" is written to the gateway while another one exists, and
only
when its done being written to the gateway, then the "old" one is
removed.
The transaction log is an append only log, and keyed by the index
version,
so you just copy it over, and how many operations managed to get into it,
you will get when you recovery.
do you think you can elaborate more on this? I am surprised this is
possible. My naive understanding is that if the snapshotting is going
on
then some files in gateway are changed, how it is then possible that
the
copy is consistent?
Yes, you can safely copy over the gateway data, either to another s3
or
filesystem, even while its snapshotting.
-shay.banon
On Wed, Aug 4, 2010 at 10:46 PM, Grant Rodgers gra...@gmail.com
wrote:
We would like to copy the gateway snapshot from a production cluster
to a development cluster. Since our production gateway is on s3,
copying indices to development could take several minutes, during
which the production cluster would still be snapshotting. Is this
safe? If not, could we disable snapshotting temporarily while the
copy
is happening?
Also, is it possible to copy an s3 gateway to a filesystem gateway?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.