Error creating snapshot with HDFS plugin, FileSystemClosed

tgreischel · June 9, 2015, 5:33pm

ES - 1.5
ES Hadoop plugin - 2.1.0 b4

{"error":"RepositoryVerificationException[[esprod-usagetracking-2015-06] path is not accessible on master node]; nested: IOException[Filesystem closed]; ","status":500}

The above is the error received when running a cronjob every other morning. That's the strange part, it works once and then the next run it fails out. So I only have snapshots for every other day. The snapshots do work, but again, only every other run. When I run the script over and over it returns the error every other run.

Has anyone seen this before?

costin · June 9, 2015, 6:09pm

Weird. It looks like the FileSystem object used underneath by the plugin is affected somehow. What distro are you using? The plugin never closes the FileSystem not does it keep on creating a new one; however other Hadoop clients running on the same machine might interfere with it as the FileSystem relies on an internal cache that can be affected.
Do you have a bigger stacktrace (potentially from Elasticsearch itself)?
Can you double check whether there are other jobs interfering with Hadoop every other day? Anything that sticks out from the Hadoop logs?
Do you restart Elasticsearch by any chance?

tgreischel · June 9, 2015, 6:20pm

Running on CentOS 6.6. ES is the only thing running on this machine and is the only cronjob. ES is not restarted in between each snapshot creation, but it has been restarted before. Below is from the logs on 6/4:

[2015-06-04 08:45:12,011][INFO ][repositories             ] [esprod00] update repository [esprod-usagetracking-2015-06]
[2015-06-04 08:45:12,068][WARN ][snapshots                ] [esprod00] failed to create snapshot [esprod-usagetracking-2015-06:snapshot-2015-06-04]
org.elasticsearch.snapshots.SnapshotCreationException: [esprod-usagetracking-2015-06:snapshot-2015-06-04] failed to create snapshot
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.initializeSnapshot(BlobStoreRepository.java:260)
        at org.elasticsearch.snapshots.SnapshotsService.beginSnapshot(SnapshotsService.java:278)
        at org.elasticsearch.snapshots.SnapshotsService.access$600(SnapshotsService.java:88)
        at org.elasticsearch.snapshots.SnapshotsService$1$1.run(SnapshotsService.java:204)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1448)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:849)
        at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobContainer.createOutput(HdfsBlobContainer.java:71)
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.initializeSnapshot(BlobStoreRepository.java:235)

costin · June 9, 2015, 8:35pm

This is helpful. What Hadoop version/distro are you using?

P.S. Can you please use formatting - really simple but improves readability a lot. Thanks

costin · June 9, 2015, 9:23pm

@tgreischel Hi
I've pushed a couple of updates which hopefully should fix your problem.

the FileSystem instance is checked to see whether it's alive or not, so in case it is closed, a new one will be created.
instead of using the typical API which relies on some Hadoop client caching (which can cause the FileSystem to be closed by other clients), a dedicated, private instance is now created instead which should be managed just by the plugin itself (though there is a shutdown hook that might close it, however see #1).

I have pushed a new dev build already in the repository - can you please try it out and let me know how it works for you. You shouldn't get the exception any more even for subsequent builds.

Cheers,

tgreischel · June 10, 2015, 1:14pm

That worked perfect! The backup ran great without error. Thanks again for the help!!

costin · June 10, 2015, 5:37pm

Glad to hear it. Cheers!

Topic		Replies	Views
Unable to take snapshot and restore using repository-hdfs Elasticsearch	1	949	July 6, 2017
Hadoop snapshot repository cannot create Elasticsearch es-hadoop	4	2705	July 6, 2017
Getting repository verification exception while creating HDFS snapshot repository Elasticsearch es-hadoop	7	3520	July 6, 2017
Elasticsearch-hdfs snapshot failed: Server IPC version 9 cannot communicate with client version 4 Elasticsearch es-hadoop	5	2103	July 6, 2017
Snapshots to HDFS Elasticsearch	1	950	July 5, 2017

Error creating snapshot with HDFS plugin, FileSystemClosed

Related topics