Feature request: Ability to snapshot gateway to a new location


(ppearcy) #1

Hey,
First off, congrats on 0.13.0. Been eagerly awaiting this update
because of my fear of index corruption with 0.12.

We had some production issues that are the impetus for this request.
Basically, our gateway got blew up. In this case, an iSCSI link went
bad, causing massive disk corruption. We're making updates to avoid
corruption in this case. After this occurred I knew that the data on
the gateway was probably shot. However, my three node cluster was
still up and happily indexing, just no longer had a gateway to write
to. Tons of file not found exceptions.

What would have been awesome to be able to do at this point was bring
up another share and send a command to the cluster to snapshot it's
entire state down to that share. Obviously, this would be an expensive
operation, very similar to full gateway recovery, just in reverse.

Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I had
had a copy of my gateway data from a couple weeks ago that I attempted
to restore. This copy was made with a cp -R command while we were
actively indexing. The gateway came up, however, ~25% of the content
was missing. It was from the larger indexes that have the more
frequent updates. I know it has been stated, that a gateway copy
should always be valid, but my experience is that is not the case with
~30GB of data that are receiving a flow of updates, a few per second
or so.

I'd happily raise a feature request for this, if it is something that
is doable with out massive rework.

Thanks!
Paul


(Shay Banon) #2
    Hi Paul,
    
      Basically, the gateway is adaptive, if things are not as expected, they will get copied over. So, in theory, you should be able to unmount and mount to a new location, and a new snapshot will happen. I say in theory since I have not tested it, and I believe that at least with the file system based gateway, I need to add checks that the directed are there and created, and if not, recreate them. What do you think?  Regarding the backup of the gateway data, what do you mean by missing data? You should see data basically up to the point where you cp -R it (or a bit later).  We talked about it a bit, but the local gateway should really simplify things without the need to have shared file system. You do loose the ability to have a "backup", but basically, if all works according to plan, the replicas are your backup...-shay.banon
	
	
    On Friday, November 19, 2010 at 10:30 PM, Paul wrote:
    
        Hey,  First off, congrats on 0.13.0. Been eagerly awaiting this updatebecause of my fear of index corruption with 0.12.  We had some production issues that are the impetus for this request.Basically, our gateway got blew up. In this case, an iSCSI link wentbad, causing massive disk corruption. We're making updates to avoidcorruption in this case. After this occurred I knew that the data onthe gateway was probably shot. However, my three node cluster wasstill up and happily indexing, just no longer had a gateway to writeto. Tons of file not found exceptions.What would have been awesome to be able to do at this point was bringup another share and send a command to the cluster to snapshot it'sentire state down to that share. Obviously, this would be an expensiveoperation, very similar to full gateway recovery, just in reverse.Similarly, this would be a feature that would allow for reliable hotback-ups of the gateway. The reason that I say reliable, is that I hadhad a copy of

my gateway data from a couple weeks ago that I attemptedto restore. This copy was made with a cp -R command while we wereactively indexing. The gateway came up, however, ~25% of the contentwas missing. It was from the larger indexes that have the morefrequent updates. I know it has been stated, that a gateway copyshould always be valid, but my experience is that is not the case with~30GB of data that are receiving a flow of updates, a few per secondor so.I'd happily raise a feature request for this, if it is something thatis doable with out massive rework.Thanks!Paul


(Berkay Mollamustafaoglu-2) #3

There is always a potential for something to go wrong especially in the
early days, and replicas are not really a solution since if there is a
corruption, there is a potential it impacts the replicas as well.
It is much better for mental health to have an offline copy of the data that
you can verify. Ideally, if we can get a backup of the an index and start it
etc. somewhere else (another ES cluster), etc. Can there be an API call that
backs up the specified indices?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi Paul,

Basically, the gateway is adaptive, if things are not as expected, they
will get copied over. So, in theory, you should be able to unmount and mount
to a new location, and a new snapshot will happen. I say in theory since I
have not tested it, and I believe that at least with the file system based
gateway, I need to add checks that the directed are there and created, and
if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing
data? You should see data basically up to the point where you cp -R it (or a
bit later).

We talked about it a bit, but the local gateway should really simplify
things without the need to have shared file system. You do loos e the
ability to have a "backup", but basically, if all works according to plan,
the replicas are your backup...

-shay.banon

On Friday, November 19, 2010 at 10:30 PM, Paul wrote:

Hey,
First off, congrats on 0.13.0. Been eagerly awaiting this update
because of my fear of index corruption with 0.12.

We had some production issues that are the impetus for this request.
Basically, our gateway got blew up. In this case, an iSCSI link went
bad, causing massive disk corruption. We're making updates to avoid
corruption in this case. After this occurred I knew that the data on
the gateway was probably shot. However, my three node cluster was
still up and happily indexing, just no longer had a gateway to write
to. Tons of file not found exceptions.

What would have been awesome to be able to do at this point was bring
up another share and send a command to the cluster to snapshot it's
entire state down to that share. Obviously, this would be an expensive
operation, very similar to full gateway recovery, just in reverse.

Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I had
had a copy of my gateway data from a couple weeks ago that I attempted
to restore. This copy was made with a cp -R command while we were
actively indexing. The gateway came up, however, ~25% of the content
was missing. It was from the larger indexes that have the more
frequent updates. I know it has been stated, that a gateway copy
should always be valid, but my experience is that is not the case with
~30GB of data that are receiving a flow of updates, a few per second
or so.

I'd happily raise a feature request for this, if it is something that
is doable with out massive rework.

Thanks!
Paul


(Shay Banon) #4

Yes, there can be an API for that, just requires some work ;). But,
regarding the chances of an index being corrupted, there is a lot of work on
the Lucene level to make sure that this does not happen, and elasticsearch
builds on that as well, and I feel pretty good with 0.13 and how it handles
this cases.

On Saturday, November 20, 2010 at 12:33 AM, Berkay Mollamustafaoglu wrote:

There is always a potential for something to go wrong especially in the
early days, and replicas are not really a solution since if there is a
corruption, there is a potential it impacts the replicas as well.
It is much better for mental health to have an offline copy of the data that
you can verify. Ideally, if we can get a backup of the an index and start it
etc. somewhere else (another ES cluster), etc. Can there be an API call that
backs up the specified indices?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi Paul,

Basically, the gateway is adaptive, if things are not as expected, they
will get copied over. So, in theory, you should be able to unmount and mount
to a new location, and a new snapshot will happen. I say in theory since I
have not tested it, and I believe that at least with the file system based
gateway, I need to add checks that the directed are there and created, and
if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing
data? You should see data basically up to the point where you cp -R it (or a
bit later).

We talked about it a bit, but the local gateway should really simplify
things without the need to have shared file system. You do loos e the
ability to have a "backup", but basically, if all works according to plan,
the replicas are your backup...

-shay.banon

On Friday, November 19, 2010 at 10:30 PM, Paul wrote:

Hey,
First off, congrats on 0.13.0. Been eagerly awaiting this update
because of my fear of index corruption with 0.12.

We had some production issues that are the impetus for this request.
Basically, our gateway got blew up. In this case, an iSCSI link went
bad, causing massive disk corruption. We're making updates to avoid
corruption in this case. After this occurred I knew that the data on
the gateway was probably shot. However, my three node cluster was
still up and happily indexing, just no longer had a gateway to write
to. Tons of file not found exceptions.

What would have been awesome to be able to do at this point was bring
up another share and send a command to the cluster to snapshot it's
entire state down to that share. Obviously, this would be an expensive
operation, very similar to full gateway recovery, just in reverse.

Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I had
had a copy of my gateway data from a couple weeks ago that I attempted
to restore. This copy was made with a cp -R command while we were
actively indexing. The gateway came up, however, ~25% of the content
was missing. It was from the larger indexes that have the more
frequent updates. I know it has been stated, that a gateway copy
should always be valid, but my experience is that is not the case with
~30GB of data that are receiving a flow of updates, a few per second
or so.

I'd happily raise a feature request for this, if it is something that
is doable with out massive rework.

Thanks!
Paul


(Berkay Mollamustafaoglu-2) #5

Understood :slight_smile:
Just FYI, there are uses other than corruption for such an API call. It is
often necessary to get the index out of production environment and move it
to another ES instance for troubleshooting, testing, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:41 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, there can be an API for that, just requires some work ;). But,
regarding the chances of an index being corrupted, there is a lot of work on
the Lucene level to make sure that this does not happen, and elasticsearch
builds on that as well, and I feel pretty good with 0.13 and how it handles
this cases.

On Saturday, November 20, 2010 at 12:33 AM, Berkay Mollamustafaoglu wrote:

There is always a potential for something to go wrong especially in the
early days, and replicas are not really a solution since if there is a
corruption, there is a potential it impacts the replicas as well.
It is much better for mental health to have an offline copy of the data
that you can verify. Ideally, if we can get a backup of the an index and
start it etc. somewhere else (another ES cluster), etc. Can there be an API
call that backs up the specified indices?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi Paul,

Basically, the gateway is adaptive, if things are not as expected, they
will get copied over. So, in theory, you should be able to unmount and mount
to a new location, and a new snapshot will happen. I say in theory since I
have not tested it, and I believe that at least with the file system based
gateway, I need to add checks that the directed are there and created, and
if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing
data? You should see data basically up to the point where you cp -R it (or a
bit later).

We talked about it a bit, but the local gateway should really simplify
things without the need to have shared file system. You do loos e the
ability to have a "backup", but basically, if all works according to plan,
the replicas are your backup...

-shay.banon

On Friday, November 19, 2010 at 10:30 PM, Paul wrote:

Hey,
First off, congrats on 0.13.0. Been eagerly awaiting this update
because of my fear of index corruption with 0.12.

We had some production issues that are the impetus for this request.
Basically, our gateway got blew up. In this case, an iSCSI link went
bad, causing massive disk corruption. We're making updates to avoid
corruption in this case. After this occurred I knew that the data on
the gateway was probably shot. However, my three node cluster was
still up and happily indexing, just no longer had a gateway to write
to. Tons of file not found exceptions.

What would have been awesome to be able to do at this point was bring
up another share and send a command to the cluster to snapshot it's
entire state down to that share. Obviously, this would be an expensive
operation, very similar to full gateway recovery, just in reverse.

Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I had
had a copy of my gateway data from a couple weeks ago that I attempted
to restore. This copy was made with a cp -R command while we were
actively indexing. The gateway came up, however, ~25% of the content
was missing. It was from the larger indexes that have the more
frequent updates. I know it has been stated, that a gateway copy
should always be valid, but my experience is that is not the case with
~30GB of data that are receiving a flow of updates, a few per second
or so.

I'd happily raise a feature request for this, if it is something that
is doable with out massive rework.

Thanks!
Paul


(Shay Banon) #6
    Agreed, though, as data grows, it makes less and less sense to have this feature simply because of the overhead it will have. Though, because of the delta based snapshotting, it can work well for shared gateway based cases, though I would love to try and solve it also for local gateway cases as well...
	
	
    On Saturday, November 20, 2010 at 5:03 AM, Berkay Mollamustafaoglu wrote:
    
        Understood :)  Just FYI, there are uses other than corruption for such an API call. It is often necessary to get the index out of production environment and move it to another ES instance for troubleshooting, testing, etc. 

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype
On Fri, Nov 19, 2010 at 5:41 PM, Shay Banon shay.banon@elasticsearch.com wrote:

    Yes, there can be an API for that, just requires some work ;). But, regarding the chances of an index being corrupted, there is a lot of work on the Lucene level to make sure that this does not happen, and elasticsearch builds on that as well, and I feel pretty good with 0.13 and how it handles this cases.



	
    On Saturday, November 20, 2010 at 12:33 AM, Berkay Mollamustafaoglu wrote:
        There is always a potential for something to go wrong especially in the early days, and replicas are not really a solution since if there is a corruption, there is a potential it impacts the replicas as well. 

It is much better for mental health to have an offline copy of the data that you can verify. Ideally, if we can get a backup of the an index and start it etc. somewhere else (another ES cluster), etc. Can there be an API call that backs up the specified indices?Â

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype
On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.banon@elasticsearch.com wrote:

    Hi Paul,
    
    Â Â Basically, the gateway is adaptive, if things are not as expected, they will get copied over. So, in theory, you should be able to unmount and mount to a new location, and a new snapshot will happen. I say in theory since I have not tested it, and I believe that at least with the file system based gateway, I need to add checks that the directed are there and created, and if not, recreate them. What do you think?

  Regarding the backup of the gateway data, what do you mean by missing data? You should see data basically up to the point where you cp -R it (or a bit later).  We talked about it a bit, but the local gateway should really simplify things without the need to have shared file system. You do loos
e the ability to have a "backup", but basically, if all works according to plan, the replicas are your backup...-shay.banon

    On Friday, November 19, 2010 at 10:30 PM, Paul wrote:
        Hey,  First off, congrats on 0.13.0. Been eagerly awaiting this updatebecause of my fear of index corruption with 0.12.  We had some production issues that are the impetus for this request.

Basically, our gateway got blew up. In this case, an iSCSI link wentbad, causing massive disk corruption. We're making updates to avoidcorruption in this case. After this occurred I knew that the data onthe gateway was probably shot. However, my three node cluster was

still up and happily indexing, just no longer had a gateway to writeto. Tons of file not found exceptions.What would have been awesome to be able to do at this point was bringup another share and send a command to the cluster to snapshot it's

entire state down to that share. Obviously, this would be an expensiveoperation, very similar to full gateway recovery, just in reverse.Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I hadhad a copy of my gateway data from a couple weeks ago that I attemptedto restore. This copy was made with a cp -R command while we were

actively indexing. The gateway came up, however, ~25% of the contentwas missing. It was from the larger indexes that have the morefrequent updates. I know it has been stated, that a gateway copyshould always be valid, but my experience is that is not the case with

~30GB of data that are receiving a flow of updates, a few per secondor so.I'd happily raise a feature request for this, if it is something thatis doable with out massive rework.Thanks!Paul


(ppearcy) #7

Hey Shay,
Being able to remount to a new point and have everything synced down
would be pretty awesome. I'll give it a shot to see what happens on
0.13.0.

When I say missing data after loading up a gateway generated with cp -
R, there were 6 indexes that came up completely empty. When I looked
at the files on the gateway, the size looked correct. No exceptions
were thrown regarding invalid commit points, either.

Thanks,
Paul

On Nov 20, 4:36 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Agreed, though, as data grows, it makes less and less sense to have this feature simply because of the overhead it will have. Though, because of the delta based snapshotting, it can work well for shared gateway based cases, though I would love to try and solve it also for local gateway cases as well...

    On Saturday, November 20, 2010 at 5:03 AM, Berkay Mollamustafaoglu wrote:

        Understood :)  Just FYI, there are uses other than corruption for such an API call. It is often necessary to get the index out of production environment and move it to another ES instance for troubleshooting, testing, etc. 

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:41 PM, Shay Banon shay.ba...@elasticsearch.com wrote:

    Yes, there can be an API for that, just requires some work ;). But, regarding the chances of an index being corrupted, there is a lot of work on the Lucene level to make sure that this does not happen, and elasticsearch builds on that as well, and I feel pretty good with 0.13 and how it handles this cases.

    On Saturday, November 20, 2010 at 12:33 AM, Berkay Mollamustafaoglu wrote:
        There is always a potential for something to go wrong especially in the early days, and replicas are not really a solution since if there is a corruption, there is a potential it impacts the replicas as well. 

It is much better for mental health to have an offline copy of the data that you can verify. Ideally, if we can get a backup of the an index and start it etc. somewhere else (another ES cluster), etc. Can there be an API call that backs up the specified indices?

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.ba...@elasticsearch.com wrote:

    Hi Paul,

      Basically, the gateway is adaptive, if things are not as expected, they will get copied over. So, in theory, you should be able to unmount and mount to a new location, and a new snapshot will happen. I say in theory since I have not tested it, and I believe that at least with the file system based gateway, I need to add checks that the directed are there and created, and if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing data? You should see data basically up to the point where you cp -R it (or a bit later). We talked about it a bit, but the local gateway should really simplify things without the need to have shared file system. You do loos
e the ability to have a "backup", but basically, if all works according to plan, the replicas are your backup...-shay.banon

    On Friday, November 19, 2010 at 10:30 PM, Paul wrote:
        Hey,  First off, congrats on 0.13.0. Been eagerly awaiting this updatebecause of my fear of index corruption with 0.12.  We had some production issues that are the impetus for this request.

Basically, our gateway got blew up. In this case, an iSCSI link wentbad, causing massive disk corruption. We're making updates to avoidcorruption in this case. After this occurred I knew that the data onthe gateway was probably shot. However, my three node cluster was

still up and happily indexing, just no longer had a gateway to writeto. Tons of file not found exceptions.What would have been awesome to be able to do at this point was bringup another share and send a command to the cluster to snapshot it's

entire state down to that share. Obviously, this would be an expensiveoperation, very similar to full gateway recovery, just in reverse.Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I hadhad a copy of my gateway data from a couple weeks ago that I attemptedto restore. This copy was made with a cp -R command while we were

actively indexing. The gateway came up, however, ~25% of the contentwas missing. It was from the larger indexes that have the morefrequent updates. I know it has been stated, that a gateway copyshould always be valid, but my experience is that is not the case with

~30GB of data that are receiving a flow of updates, a few per secondor so.I'd happily raise a feature request for this, if it is something thatis doable with out massive rework.Thanks!Paul


(ppearcy) #8

Btw, we're finally on board with the local gateway. We liked the one
central point to back up with the shared gateway, but this being the
only advantage I know of just hasn't been worth the operational
support of creating a solid redundant shared gateway.

Quick question... are you able to make hot backups of the local
gateway? Or fundamentally, should this be achieved via replicas?

Thanks,
Paul

On Nov 22, 1:23 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Being able to remount to a new point and have everything synced down
would be pretty awesome. I'll give it a shot to see what happens on
0.13.0.

When I say missing data after loading up a gateway generated with cp -
R, there were 6 indexes that came up completely empty. When I looked
at the files on the gateway, the size looked correct. No exceptions
were thrown regarding invalid commit points, either.

Thanks,
Paul

On Nov 20, 4:36 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Agreed, though, as data grows, it makes less and less sense to have this feature simply because of the overhead it will have. Though, because of the delta based snapshotting, it can work well for shared gateway based cases, though I would love to try and solve it also for local gateway cases as well...
    On Saturday, November 20, 2010 at 5:03 AM, Berkay Mollamustafaoglu wrote:
        Understood :)  Just FYI, there are uses other than corruption for such an API call. It is often necessary to get the index out of production environment and move it to another ES instance for troubleshooting, testing, etc. 

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:41 PM, Shay Banon shay.ba...@elasticsearch.com wrote:

    Yes, there can be an API for that, just requires some work ;). But, regarding the chances of an index being corrupted, there is a lot of work on the Lucene level to make sure that this does not happen, and elasticsearch builds on that as well, and I feel pretty good with 0.13 and how it handles this cases.
    On Saturday, November 20, 2010 at 12:33 AM, Berkay Mollamustafaoglu wrote:
        There is always a potential for something to go wrong especially in the early days, and replicas are not really a solution since if there is a corruption, there is a potential it impacts the replicas as well. 

It is much better for mental health to have an offline copy of the data that you can verify. Ideally, if we can get a backup of the an index and start it etc. somewhere else (another ES cluster), etc. Can there be an API call that backs up the specified indices?

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon shay.ba...@elasticsearch.com wrote:

    Hi Paul,
      Basically, the gateway is adaptive, if things are not as expected, they will get copied over. So, in theory, you should be able to unmount and mount to a new location, and a new snapshot will happen. I say in theory since I have not tested it, and I believe that at least with the file system based gateway, I need to add checks that the directed are there and created, and if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing data? You should see data basically up to the point where you cp -R it (or a bit later). We talked about it a bit, but the local gateway should really simplify things without the need to have shared file system. You do loos
e the ability to have a "backup", but basically, if all works according to plan, the replicas are your backup...-shay.banon

    On Friday, November 19, 2010 at 10:30 PM, Paul wrote:
        Hey,  First off, congrats on 0.13.0. Been eagerly awaiting this updatebecause of my fear of index corruption with 0.12.  We had some production issues that are the impetus for this request.

Basically, our gateway got blew up. In this case, an iSCSI link wentbad, causing massive disk corruption. We're making updates to avoidcorruption in this case. After this occurred I knew that the data onthe gateway was probably shot. However, my three node cluster was

still up and happily indexing, just no longer had a gateway to writeto. Tons of file not found exceptions.What would have been awesome to be able to do at this point was bringup another share and send a command to the cluster to snapshot it's

entire state down to that share. Obviously, this would be an expensiveoperation, very similar to full gateway recovery, just in reverse.Similarly, this would be a feature that would allow for reliable hot
back-ups of the gateway. The reason that I say reliable, is that I hadhad a copy of my gateway data from a couple weeks ago that I attemptedto restore. This copy was made with a cp -R command while we were

actively indexing. The gateway came up, however, ~25% of the contentwas missing. It was from the larger indexes that have the morefrequent updates. I know it has been stated, that a gateway copyshould always be valid, but my experience is that is not the case with

~30GB of data that are receiving a flow of updates, a few per secondor so.I'd happily raise a feature request for this, if it is something thatis doable with out massive rework.Thanks!Paul


(Shay Banon) #9

Fundamentally, the idea is to achieve it using replicas. You can certainly
backup the actual data location for each node, and then restore for each,
but local gateway with replicas should perform better and be less expensive
with the actual copy process (which will be IO extensive).

-shay.banon

On Tue, Nov 23, 2010 at 8:07 PM, Paul ppearcy@gmail.com wrote:

Btw, we're finally on board with the local gateway. We liked the one
central point to back up with the shared gateway, but this being the
only advantage I know of just hasn't been worth the operational
support of creating a solid redundant shared gateway.

Quick question... are you able to make hot backups of the local
gateway? Or fundamentally, should this be achieved via replicas?

Thanks,
Paul

On Nov 22, 1:23 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Being able to remount to a new point and have everything synced down
would be pretty awesome. I'll give it a shot to see what happens on
0.13.0.

When I say missing data after loading up a gateway generated with cp -
R, there were 6 indexes that came up completely empty. When I looked
at the files on the gateway, the size looked correct. No exceptions
were thrown regarding invalid commit points, either.

Thanks,
Paul

On Nov 20, 4:36 am, Shay Banon shay.ba...@elasticsearch.com wrote:

    Agreed, though, as data grows, it makes less and less sense to

have this feature simply because of the overhead it will have. Though,
because of the delta based snapshotting, it can work well for shared gateway
based cases, though I would love to try and solve it also for local gateway
cases as well...

    On Saturday, November 20, 2010 at 5:03 AM, Berkay

Mollamustafaoglu wrote:

        Understood :)  Just FYI, there are uses other than

corruption for such an API call. It is often necessary to get the index out
of production environment and move it to another ES instance for
troubleshooting, testing, etc.

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:41 PM, Shay Banon <
shay.ba...@elasticsearch.com> wrote:

    Yes, there can be an API for that, just requires some work ;).

But, regarding the chances of an index being corrupted, there is a lot of
work on the Lucene level to make sure that this does not happen, and
elasticsearch builds on that as well, and I feel pretty good with 0.13 and
how it handles this cases.

    On Saturday, November 20, 2010 at 12:33 AM, Berkay

Mollamustafaoglu wrote:

        There is always a potential for something to go wrong

especially in the early days, and replicas are not really a solution since
if there is a corruption, there is a potential it impacts the replicas as
well.

It is much better for mental health to have an offline copy of the data
that you can verify. Ideally, if we can get a backup of the an index and
start it etc. somewhere else (another ES cluster), etc. Can there be an API
call that backs up the specified indices?

Regards,Berkay Mollamustafaoglumberkay on yahoo, google and skype

On Fri, Nov 19, 2010 at 5:01 PM, Shay Banon <
shay.ba...@elasticsearch.com> wrote:

    Hi Paul,
      Basically, the gateway is adaptive, if things are not as

expected, they will get copied over. So, in theory, you should be able to
unmount and mount to a new location, and a new snapshot will happen. I say
in theory since I have not tested it, and I believe that at least with the
file system based gateway, I need to add checks that the directed are there
and created, and if not, recreate them. What do you think?

Regarding the backup of the gateway data, what do you mean by missing
data? You should see data basically up to the point where you cp -R it (or a
bit later). We talked about it a bit, but the local gateway should really
simplify things without the need to have shared file system. You do loos

e the ability to have a "backup", but basically, if all works according
to plan, the replicas are your backup...-shay.banon

    On Friday, November 19, 2010 at 10:30 PM, Paul wrote:
        Hey,  First off, congrats on 0.13.0. Been eagerly awaiting

this updatebecause of my fear of index corruption with 0.12. We had some
production issues that are the impetus for this request.

Basically, our gateway got blew up. In this case, an iSCSI link
wentbad, causing massive disk corruption. We're making updates to
avoidcorruption in this case. After this occurred I knew that the data onthe
gateway was probably shot. However, my three node cluster was

still up and happily indexing, just no longer had a gateway to writeto.
Tons of file not found exceptions.What would have been awesome to be able to
do at this point was bringup another share and send a command to the cluster
to snapshot it's

entire state down to that share. Obviously, this would be an
expensiveoperation, very similar to full gateway recovery, just in
reverse.Similarly, this would be a feature that would allow for reliable hot

back-ups of the gateway. The reason that I say reliable, is that I
hadhad a copy of my gateway data from a couple weeks ago that I attemptedto
restore. This copy was made with a cp -R command while we were

actively indexing. The gateway came up, however, ~25% of the contentwas
missing. It was from the larger indexes that have the morefrequent updates.
I know it has been stated, that a gateway copyshould always be valid, but my
experience is that is not the case with

~30GB of data that are receiving a flow of updates, a few per secondor
so.I'd happily raise a feature request for this, if it is something thatis
doable with out massive rework.Thanks!Paul


(system) #10