No commit point data is available in gateway

We suddenly started seeing this message after launching another
instance:

---snip---
[2012-02-08 16:07:42,873][WARN ][cluster.action.shard ]
[es1.dev.example.ec2] sending failed shard for [ideas][3],
node[_lKO3A9mS5W3wrDdvuAZlg], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ideas][3] No
commit point data is available in gateway]]]
[2012-02-08 16:07:43,311][WARN ][index.gateway.s3 ]
[es1.dev.studyblue.ec2] [ideas][3] listed commit_point [commit-16qe]/
[55382], but not all files exists, ignoring
[2012-02-08 16:07:43,311][WARN ][indices.cluster ]
[es1.dev.example.ec2] [ideas][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[ideas][3] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recover(BlobStoreIndexShardGateway.java:
434)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
---snip---

Everything I've read indicates that the data in shard 3 is lost. While
that's really unfortunate, it's not the end of the world. The real
problem is that now that all the nodes in the cluster log this message
2-5 times PER SECOND, which is causing our logfiles to fill up.

Is there any way to recover from this error? Any way we just tell the
cluster to give up and "scrap" the data in shard 3?

-S

You will need to delete the index. Which version are you using?

On Thursday, February 9, 2012 at 12:11 AM, VegHead wrote:

We suddenly started seeing this message after launching another
instance:

---snip---
[2012-02-08 16:07:42,873][WARN ][cluster.action.shard ]
[es1.dev.example.ec2] sending failed shard for [ideas][3],
node[_lKO3A9mS5W3wrDdvuAZlg], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ideas][3] No
commit point data is available in gateway]]]
[2012-02-08 16:07:43,311][WARN ][index.gateway.s3 ]
[es1.dev.studyblue.ec2] [ideas][3] listed commit_point [commit-16qe]/
[55382], but not all files exists, ignoring
[2012-02-08 16:07:43,311][WARN ][indices.cluster ]
[es1.dev.example.ec2] [ideas][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[ideas][3] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recover(BlobStoreIndexShardGateway.java:
434)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
---snip---

Everything I've read indicates that the data in shard 3 is lost. While
that's really unfortunate, it's not the end of the world. The real
problem is that now that all the nodes in the cluster log this message
2-5 times PER SECOND, which is causing our logfiles to fill up.

Is there any way to recover from this error? Any way we just tell the
cluster to give up and "scrap" the data in shard 3?

-S

That's extremely unfortunate... I was really hoping to avoid throwing
everything out, since it takes a while to reindex 30 million
documents...

Two nodes with 0.18.6 and one that I had just upgraded to 0.18.7.

-S

On Feb 8, 4:34 pm, Shay Banon kim...@gmail.com wrote:

You will need to delete the index. Which version are you using?

On Thursday, February 9, 2012 at 12:11 AM, VegHead wrote:

We suddenly started seeing this message after launching another
instance:

---snip---
[2012-02-08 16:07:42,873][WARN ][cluster.action.shard ]
[es1.dev.example.ec2] sending failed shard for [ideas][3],
node[_lKO3A9mS5W3wrDdvuAZlg], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ideas][3] No
commit point data is available in gateway]]]
[2012-02-08 16:07:43,311][WARN ][index.gateway.s3 ]
[es1.dev.studyblue.ec2] [ideas][3] listed commit_point [commit-16qe]/
[55382], but not all files exists, ignoring
[2012-02-08 16:07:43,311][WARN ][indices.cluster ]
[es1.dev.example.ec2] [ideas][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[ideas][3] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recove r(BlobStoreIndexShardGateway.java:
434)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
---snip---

Everything I've read indicates that the data in shard 3 is lost. While
that's really unfortunate, it's not the end of the world. The real
problem is that now that all the nodes in the cluster log this message
2-5 times PER SECOND, which is causing our logfiles to fill up.

Is there any way to recover from this error? Any way we just tell the
cluster to give up and "scrap" the data in shard 3?

On 10 February 2012 03:19, VegHead organicveggie@gmail.com wrote:

That's extremely unfortunate... I was really hoping to avoid throwing
everything out, since it takes a while to reindex 30 million
documents...

Two nodes with 0.18.6 and one that I had just upgraded to 0.18.7.

Shay, is there an API way to delete a specific 'bad' shard of an index? I'm
not familiar with this type of problem, and my answer here probably won't
help because it presumes a certain pattern of indexing your data but if
there was a way to delete the bad shard (if it is a bad shard issue)
and you did store the ID & Version field in elasticsearch a value matching
whatever primary key ID comes from your source of truth (say, the data you
indexed comes from a DB and your Id) then this is exactly what Scrutineer
can help with:

https://github.com/Aconex/scrutineer

If the bad shard could be deleted, leaving a hole in your index data for
sure, then Scutineer can let you work out which Ids are missing and you can
reindex just those. Lets say you had 5 shards, and 1 shard went bad, then
you only need to reindex 20% of your data.

Yes, this is a little plug for Scrutineer, and probably won't help you in
this case, but maybe for next time you might be able to use Scrutineer to
recover from failure scenarios quicker. Having worked with Lucene related
systems for 6 years now all I can say is that planning for the Absolute
Worst Case scenario for large indexes is always a good idea. Everyone
should be presuming something terrible is going to happen to their index
one day (could be as simple as someone accidentally issuing a Delete index!
no really! that can happen if someone thinks they're connected to the wrong
box...)

cheers,

Paul Smith

Shay,

Does this error mean that all replicas are bad as well and the index can't
recover the shard from a replica?

Besides improving query performance don't replicas also provide failover
when the shard itself is offline, corrupt, or otherwise unusable?

Heya Paul,

No, there isn't a way to say: ok, I understand this shard is in a bad state, or I will never be able to recover it (because I lost all replicas of it with local gateway, or because the shared gateway state got corrupted in a shared gateway), so go ahead and create a fresh copy of it. We can possibly add it as an API option.

On Friday, February 10, 2012 at 1:15 AM, Paul Smith wrote:

On 10 February 2012 03:19, VegHead <organicveggie@gmail.com (mailto:organicveggie@gmail.com)> wrote:

That's extremely unfortunate... I was really hoping to avoid throwing
everything out, since it takes a while to reindex 30 million
documents...

Two nodes with 0.18.6 and one that I had just upgraded to 0.18.7.

Shay, is there an API way to delete a specific 'bad' shard of an index? I'm not familiar with this type of problem, and my answer here probably won't help because it presumes a certain pattern of indexing your data but if there was a way to delete the bad shard (if it is a bad shard issue) and you did store the ID & Version field in elasticsearch a value matching whatever primary key ID comes from your source of truth (say, the data you indexed comes from a DB and your Id) then this is exactly what Scrutineer can help with:

https://github.com/Aconex/scrutineer

If the bad shard could be deleted, leaving a hole in your index data for sure, then Scutineer can let you work out which Ids are missing and you can reindex just those. Lets say you had 5 shards, and 1 shard went bad, then you only need to reindex 20% of your data.

Yes, this is a little plug for Scrutineer, and probably won't help you in this case, but maybe for next time you might be able to use Scrutineer to recover from failure scenarios quicker. Having worked with Lucene related systems for 6 years now all I can say is that planning for the Absolute Worst Case scenario for large indexes is always a good idea. Everyone should be presuming something terrible is going to happen to their index one day (could be as simple as someone accidentally issuing a Delete index! no really! that can happen if someone thinks they're connected to the wrong box...)

cheers,

Paul Smith

Wes, this only applies in the local gateway case, which is the recommended one to be used. With the shared gateway option (the one used in this case, the s3 one), the shared state (on s3 in this case) is the one that counts, and replicas are only there for search performance, not HA.

On Friday, February 10, 2012 at 6:27 PM, Wes Plunk wrote:

Shay,

Does this error mean that all replicas are bad as well and the index can't recover the shard from a replica?

Besides improving query performance don't replicas also provide failover when the shard itself is offline, corrupt, or otherwise unusable?

I have a dumb question (and I realize it's veering away from the original
topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler
for backups. And keeping the shared state on S3 means less load on other
nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended
one to be used. With the shared gateway option (the one used in this case,
the s3 one), the shared state (on s3 in this case) is the one that counts,
and replicas are only there for search performance, not HA.

The reason I recommend using local gateway is because of the overhead of keeping on copying the data to s3 from the "local" storage of each node. You could reduce the "snapshot" interval and snapshot less often, but then, when doing a full restart, then you will "miss" more, since the system always recovers and syncs against s3.

The best solution is a combination. Use local gateway, but be able to backup into s3. Full restart will still use the local gateway, and only if explicitly desired, a restore operation can be done to restore from a specific backup from s3.

On Friday, February 17, 2012 at 10:36 PM, VegHead wrote:

I have a dumb question (and I realize it's veering away from the original topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler for backups. And keeping the shared state on S3 means less load on other nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended one to be used. With the shared gateway option (the one used in this case, the s3 one), the shared state (on s3 in this case) is the one that counts, and replicas are only there for search performance, not HA.

Hi Shay,

just to make it clear regarding the combination, when using local gateway
then to be able to backup into S3 (or other storage) one has to do that
manually, ie, there is no direct support in ES for local gateway to allow
also S3 "snapshotting" right?

And when you say "Full restart will still use the local gateway, and only
if explicitly desired, a restore operation can be done to restore from a
specific backup from s3." what do you exactly mean? Say I use local gateway
and in crash I lose some data (can not recover from local gateway). How can
I explicitly tell ES node to use one time recovery from shared storage? Did
you mean I need to manually copy node local gateway snapshot to the node
data folder before node start?

Regards,
Lukas

On Sun, Feb 19, 2012 at 1:48 PM, Shay Banon kimchy@gmail.com wrote:

The reason I recommend using local gateway is because of the overhead of
keeping on copying the data to s3 from the "local" storage of each node.
You could reduce the "snapshot" interval and snapshot less often, but then,
when doing a full restart, then you will "miss" more, since the system
always recovers and syncs against s3.

The best solution is a combination. Use local gateway, but be able to
backup into s3. Full restart will still use the local gateway, and only if
explicitly desired, a restore operation can be done to restore from a
specific backup from s3.

On Friday, February 17, 2012 at 10:36 PM, VegHead wrote:

I have a dumb question (and I realize it's veering away from the original
topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler
for backups. And keeping the shared state on S3 means less load on other
nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended
one to be used. With the shared gateway option (the one used in this case,
the s3 one), the shared state (on s3 in this case) is the one that counts,
and replicas are only there for search performance, not HA.

On Monday, February 20, 2012 at 2:52 AM, Lukáš Vlček wrote:

Hi Shay,

just to make it clear regarding the combination, when using local gateway then to be able to backup into S3 (or other storage) one has to do that manually, ie, there is no direct support in ES for local gateway to allow also S3 "snapshotting" right?
Right, at least not yet.

And when you say "Full restart will still use the local gateway, and only if explicitly desired, a restore operation can be done to restore from a specific backup from s3." what do you exactly mean? Say I use local gateway and in crash I lose some data (can not recover from local gateway). How can I explicitly tell ES node to use one time recovery from shared storage? Did you mean I need to manually copy node local gateway snapshot to the node data folder before node start?
If we have a "backup/restore" option, and you lost some data from the local gateway, you could just issue a "restore" API call for the specific indices.

Regards,
Lukas

On Sun, Feb 19, 2012 at 1:48 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

The reason I recommend using local gateway is because of the overhead of keeping on copying the data to s3 from the "local" storage of each node. You could reduce the "snapshot" interval and snapshot less often, but then, when doing a full restart, then you will "miss" more, since the system always recovers and syncs against s3.

The best solution is a combination. Use local gateway, but be able to backup into s3. Full restart will still use the local gateway, and only if explicitly desired, a restore operation can be done to restore from a specific backup from s3.

On Friday, February 17, 2012 at 10:36 PM, VegHead wrote:

I have a dumb question (and I realize it's veering away from the original topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler for backups. And keeping the shared state on S3 means less load on other nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended one to be used. With the shared gateway option (the one used in this case, the s3 one), the shared state (on s3 in this case) is the one that counts, and replicas are only there for search performance, not HA.

Shay,
I am confused, you are probably talking about possible future "restore"
API, but it is not implemented now, right? So right now, if one want to
restore from different location then it is manual work (probably copy to
"data" folder of the node before it is started, if I understand correctly).
Or am I missing something here?
Regards,
Lukas

On Mon, Feb 20, 2012 at 2:11 PM, Shay Banon kimchy@gmail.com wrote:

On Monday, February 20, 2012 at 2:52 AM, Lukáš Vlček wrote:

Hi Shay,

just to make it clear regarding the combination, when using local gateway
then to be able to backup into S3 (or other storage) one has to do that
manually, ie, there is no direct support in ES for local gateway to allow
also S3 "snapshotting" right?

Right, at least not yet.

And when you say "Full restart will still use the local gateway, and only
if explicitly desired, a restore operation can be done to restore from a
specific backup from s3." what do you exactly mean? Say I use local
gateway and in crash I lose some data (can not recover from local gateway).
How can I explicitly tell ES node to use one time recovery from shared
storage? Did you mean I need to manually copy node local gateway snapshot
to the node data folder before node start?

If we have a "backup/restore" option, and you lost some data from the
local gateway, you could just issue a "restore" API call for the specific
indices.

Regards,
Lukas

On Sun, Feb 19, 2012 at 1:48 PM, Shay Banon kimchy@gmail.com wrote:

The reason I recommend using local gateway is because of the overhead of
keeping on copying the data to s3 from the "local" storage of each node.
You could reduce the "snapshot" interval and snapshot less often, but then,
when doing a full restart, then you will "miss" more, since the system
always recovers and syncs against s3.

The best solution is a combination. Use local gateway, but be able to
backup into s3. Full restart will still use the local gateway, and only if
explicitly desired, a restore operation can be done to restore from a
specific backup from s3.

On Friday, February 17, 2012 at 10:36 PM, VegHead wrote:

I have a dumb question (and I realize it's veering away from the original
topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler
for backups. And keeping the shared state on S3 means less load on other
nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended
one to be used. With the shared gateway option (the one used in this case,
the s3 one), the shared state (on s3 in this case) is the one that counts,
and replicas are only there for search performance, not HA.

Yes, "restore" is a future possible API and functionality.

On Monday, February 20, 2012 at 3:52 PM, Lukáš Vlček wrote:

Shay,
I am confused, you are probably talking about possible future "restore" API, but it is not implemented now, right? So right now, if one want to restore from different location then it is manual work (probably copy to "data" folder of the node before it is started, if I understand correctly). Or am I missing something here?
Regards,
Lukas

On Mon, Feb 20, 2012 at 2:11 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

On Monday, February 20, 2012 at 2:52 AM, Lukáš Vlček wrote:

Hi Shay,

just to make it clear regarding the combination, when using local gateway then to be able to backup into S3 (or other storage) one has to do that manually, ie, there is no direct support in ES for local gateway to allow also S3 "snapshotting" right?
Right, at least not yet.

And when you say "Full restart will still use the local gateway, and only if explicitly desired, a restore operation can be done to restore from a specific backup from s3." what do you exactly mean? Say I use local gateway and in crash I lose some data (can not recover from local gateway). How can I explicitly tell ES node to use one time recovery from shared storage? Did you mean I need to manually copy node local gateway snapshot to the node data folder before node start?
If we have a "backup/restore" option, and you lost some data from the local gateway, you could just issue a "restore" API call for the specific indices.

Regards,
Lukas

On Sun, Feb 19, 2012 at 1:48 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

The reason I recommend using local gateway is because of the overhead of keeping on copying the data to s3 from the "local" storage of each node. You could reduce the "snapshot" interval and snapshot less often, but then, when doing a full restart, then you will "miss" more, since the system always recovers and syncs against s3.

The best solution is a combination. Use local gateway, but be able to backup into s3. Full restart will still use the local gateway, and only if explicitly desired, a restore operation can be done to restore from a specific backup from s3.

On Friday, February 17, 2012 at 10:36 PM, VegHead wrote:

I have a dumb question (and I realize it's veering away from the original topic): why do you recommend the local gateway over the S3 gateway?

Seems to me that S3, while slower than the local gateway, is far simpler for backups. And keeping the shared state on S3 means less load on other nodes when adding a new node to the cluster.

-VegHead

On Sunday, February 12, 2012 8:18:58 AM UTC-6, kimchy wrote:

Wes, this only applies in the local gateway case, which is the recommended one to be used. With the shared gateway option (the one used in this case, the s3 one), the shared state (on s3 in this case) is the one that counts, and replicas are only there for search performance, not HA.