Snapshot failing

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][3] failed to snapshot
(scheduled)
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][3] Failed to perform snapshot (index files)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:246)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1042)
at
org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:528)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException:
/var/lib/elasticsearch/pgossamerv01/nodes/0/indices/pdeployment3763380876869935/3/index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:71)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:98)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.elasticsearch.index.store.Store.openInputRaw(Store.java:314)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshotFile(BlobStoreIndexShardGateway.java:755)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:228)
... 10 more

--

Hi,

That looks a little messy. Are you able to share some information about
your setup? Elasticsearch version, any hardware information, when did these
errors begin, that sort of thing.

On Monday, November 19, 2012 4:50:11 PM UTC+13, stoned7 wrote:

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][3] failed to snapshot
(scheduled)
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][3] Failed to perform snapshot (index files)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:246)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1042)
at
org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:528)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException:
/var/lib/elasticsearch/pgossamerv01/nodes/0/indices/pdeployment3763380876869935/3/index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:71)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:98)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345)
at org.elasticsearch.index.store.Store.openInputRaw(Store.java:314)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshotFile(BlobStoreIndexShardGateway.java:755)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:228)
... 10 more

--

Cluster are of 2 EC2 instance (ubuntu 12.04 server) of AWS with S3 as
gateway.

1st EC2 instance is ES HTTP Client (2 GB RAM)
2nd EC2 Instance is ES master and data nodes. (8 GB RAM)

ES version is 0.19.8

Index data size is about 1.5 GB and where we have like 100 indices
approximately.

Cluster was up from more than 30 days and everything was fine, suddenly one
day logs are filled up with the following message, after that it is
happening continuously.

Though, I can do PUT POST GET DELETE operation and cluster health is also
green.
Memory, CPU utilization, File Descriptors etc all are under control.

Let me know, if you guys need more information about the error.

Thanks :slight_smile:

On 20 November 2012 09:02, Chris Male gento0nz@gmail.com wrote:

Hi,

That looks a little messy. Are you able to share some information about
your setup? Elasticsearch version, any hardware information, when did these
errors begin, that sort of thing.

On Monday, November 19, 2012 4:50:11 PM UTC+13, stoned7 wrote:

Hi everybody,

I am getting the following error though the cluster health is green.
Any help ?

[2012-11-14 23:59:59,971][WARN ][index.gateway ]
[pgossamerv01_slave3] [pdeployment3763380876869935][**3] failed to
snapshot (scheduled)
org.elasticsearch.index.**gateway.IndexShardGatewaySnapshotFailedException:
[pdeployment3763380876869935][**3] Failed to perform snapshot (index
files)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**doSnapshot(BlobStoreIndexShardGateway.
java:246)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**snapshot(BlobStoreIndexShardGateway.
java:160)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$2.
snapshot(**IndexShardGatewayService.java:**271)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$2.
snapshot(**IndexShardGatewayService.java:**265)
at org.elasticsearch.index.engine.robin.RobinEngine.
snapshot(RobinEngine.java:**1042)
at org.elasticsearch.index.shard.service.InternalIndexShard.
snapshot(InternalIndexShard.**java:528)
at org.elasticsearch.index.**gateway.IndexShardGatewayService.
snapshot(**IndexShardGatewayService.java:**265)
at org.elasticsearch.index.**gateway.IndexShardGatewayService$
SnapshotRunnable.run(**IndexShardGatewayService.java:**366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /var/lib/elasticsearch/

pgossamerv01/nodes/0/indices/**pdeployment3763380876869935/3/**index/segments_4
(No such file or directory)
at java.io.RandomAccessFile.open(**Native Method)
at java.io.RandomAccessFile.<**init>(RandomAccessFile.java:233)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput$Descriptor.
(SimpleFSDirectory.java:**71)
at org.apache.lucene.store.SimpleFSDirectory$
SimpleFSIndexInput.(**SimpleFSDirectory.java:98)
at org.apache.lucene.store.**NIOFSDirectory$NIOFSIndexInput.(
NIOFSDirectory.java:92)
at org.apache.lucene.store.NIOFSDirectory.openInput(
NIOFSDirectory.java:79)
at org.apache.lucene.store.FSDirectory.openInput(
FSDirectory.java:345)
at org.elasticsearch.index.store.**Store.openInputRaw(Store.java:**314)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**snapshotFile(BlobStoreIndexShardGateway.
java:755)
at org.elasticsearch.index.gateway.blobstore.
BlobStoreIndexShardGateway.**doSnapshot(BlobStoreIndexShardGateway.
java:228)
... 10 more

--

--
Sujan

--

Cluster are of 2 EC2 instance (ubuntu 12.04 server) of AWS with S3 as
gateway.

The S3 gateway is on its way to be deprecated, and it is not recommended
to use it. There's potential for index corruption and weird issues.

The best solution for you is to migrate to an EBS-backed local gateway. You
can use elasticsearch and increased number of replicas for that. The
process would be as follows:

  • Create a two new IOPS EBS volumes [1], with enough space to hold your data
  • Launch a new EC2 instance with proper security groups
  • Mount the EBS on the new instance [2], to a good location such as
    /usr/local/var/elasticsearch/data1
  • Install and configure elasticsearch on the machine, using the same
    cluster name as your original cluster, using a local gateway, pointed to
    the location where you mounted the EBS volume
  • Launch elasticsearch on these new instances
  • Increase the number_of_replicas for your indices to four (ie. equal to
    the number of nodes). Your data will be now spread across all the nodes:
    the old ones, and the new ones.
  • Use Paramedic, BigDesk or Head elasticsearch plugins to monitor cluster
    health: once you're in a "green" health, and all shards are allocated, you
    can shutdown the old, S3-based nodes
  • You have migrated all data to a new cluster. The best practice now would
    be to do a snapshot of your EBS volumes, so you have a recovery strategy.
    You can delete the S3 buckets after doing that.

This strategy allows you to scale when the volume of your data grows and
the computing capacity of your cluster is enough: you can create a new set
of EBS volumes, mount them to a location such as
/usr/local/var/elasticsearch/data2 and point elasticsearch data.path to
both locations (it is possible to use multiple directories as the
data.path).

Karel

[1]
http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html
[2]

[3]

--