Snapshot failing

sujan_dutta · November 19, 2012, 3:50am

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][3] failed to snapshot
(scheduled)
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][3] Failed to perform snapshot (index files)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:246)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1042)
at
org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:528)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException:
/var/lib/elasticsearch/pgossamerv01/nodes/0/indices/pdeployment3763380876869935/3/index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:71)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:98)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.elasticsearch.index.store.Store.openInputRaw(Store.java:314)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshotFile(BlobStoreIndexShardGateway.java:755)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:228)
... 10 more

--

Chris_Male · November 20, 2012, 3:32am

Hi,

That looks a little messy. Are you able to share some information about
your setup? Elasticsearch version, any hardware information, when did these
errors begin, that sort of thing.

On Monday, November 19, 2012 4:50:11 PM UTC+13, stoned7 wrote:

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][3] failed to snapshot
(scheduled)
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][3] Failed to perform snapshot (index files)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:246)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1042)
at
org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:528)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException:
/var/lib/elasticsearch/pgossamerv01/nodes/0/indices/pdeployment3763380876869935/3/index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:71)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:98)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.elasticsearch.index.store.Store.openInputRaw(Store.java:314)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshotFile(BlobStoreIndexShardGateway.java:755)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:228)
... 10 more

--

sujan_dutta · November 20, 2012, 4:19am

Cluster are of 2 EC2 instance (ubuntu 12.04 server) of AWS with S3 as
gateway.

1st EC2 instance is ES HTTP Client (2 GB RAM)
2nd EC2 Instance is ES master and data nodes. (8 GB RAM)

ES version is 0.19.8

Index data size is about 1.5 GB and where we have like 100 indices
approximately.

Cluster was up from more than 30 days and everything was fine, suddenly one
day logs are filled up with the following message, after that it is
happening continuously.

Though, I can do PUT POST GET DELETE operation and cluster health is also
green.
Memory, CPU utilization, File Descriptors etc all are under control.

Let me know, if you guys need more information about the error.

Thanks

On 20 November 2012 09:02, Chris Male gento0nz@gmail.com wrote:

Hi,

That looks a little messy. Are you able to share some information about
your setup? Elasticsearch version, any hardware information, when did these
errors begin, that sort of thing.

On Monday, November 19, 2012 4:50:11 PM UTC+13, stoned7 wrote:

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][**3] failed to
snapshot (scheduled)
org.elasticsearch.index.**gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][**3] Failed to perform snapshot (index
files)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**doSnapshot(BlobStoreIndexShardGateway.
java:246)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**snapshot(BlobStoreIndexShardGateway.
java:160)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$2.
snapshot(**IndexShardGatewayService.java:**271)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$2.
snapshot(**IndexShardGatewayService.java:**265)
at org.elasticsearch.index.engine.robin.RobinEngine.
snapshot(RobinEngine.java:**1042)
at org.elasticsearch.index.shard.service.InternalIndexShard.
snapshot(InternalIndexShard.**java:528)
at org.elasticsearch.index.**gateway.IndexShardGatewayService.
snapshot(**IndexShardGatewayService.java:**265)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$
SnapshotRunnable.run(**IndexShardGatewayService.java:**366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /var/lib/elasticsearch/
pgossamerv01/nodes/0/indices/**pdeployment3763380876869935/3/**index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(**Native Method)
at java.io.RandomAccessFile.<**init>(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:**71)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput.(**SimpleFSDirectory.java:98)
at org.apache.lucene.store.**NIOFSDirectory$NIOFSIndexInput.(
NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(
NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(
FSDirectory.java:345)
at org.elasticsearch.index.store.**Store.openInputRaw(Store.java:**314)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**snapshotFile(BlobStoreIndexShardGateway.
java:755)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**doSnapshot(BlobStoreIndexShardGateway.
java:228)
... 10 more

--

--
Sujan

--

karmi · November 21, 2012, 10:36am

Cluster are of 2 EC2 instance (ubuntu 12.04 server) of AWS with S3 as
gateway.

The S3 gateway is on its way to be deprecated, and it is not recommended
to use it. There's potential for index corruption and weird issues.

The best solution for you is to migrate to an EBS-backed local gateway. You
can use elasticsearch and increased number of replicas for that. The
process would be as follows:

Create a two new IOPS EBS volumes [1], with enough space to hold your data
Launch a new EC2 instance with proper security groups
Mount the EBS on the new instance [2], to a good location such as
/usr/local/var/elasticsearch/data1
Install and configure elasticsearch on the machine, using the same
cluster name as your original cluster, using a local gateway, pointed to
the location where you mounted the EBS volume
Launch elasticsearch on these new instances
Increase the number_of_replicas for your indices to four (ie. equal to
the number of nodes). Your data will be now spread across all the nodes:
the old ones, and the new ones.
Use Paramedic, BigDesk or Head elasticsearch plugins to monitor cluster
health: once you're in a "green" health, and all shards are allocated, you
can shutdown the old, S3-based nodes
You have migrated all data to a new cluster. The best practice now would
be to do a snapshot of your EBS volumes, so you have a recovery strategy.
You can delete the S3 buckets after doing that.

This strategy allows you to scale when the volume of your data grows and
the computing capacity of your cluster is enough: you can create a new set
of EBS volumes, mount them to a location such as
/usr/local/var/elasticsearch/data2 and point elasticsearch data.path to
both locations (it is possible to use multiple directories as the
data.path).

Karel

[1]
http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html
[2]

[3]

--

Topic		Replies	Views
Snapshots occasionally fail Elasticsearch snapshot-and-restore	1	522	February 28, 2022
Blob size with S3 Elasticsearch	9	876	July 6, 2017
[RESOLVED] Sporadically failing snapshots Elasticsearch	3	329	September 14, 2018
Failed to perform snapshot (index files)]; nested: FileNotFoundException Elasticsearch	3	1331	July 6, 2017
Elasticsearch Snapshot Elasticsearch	6	418	June 28, 2018

Snapshot failing

Related topics