Error creating snapshot with HDFS plugin, FileSystemClosed

(Tony Greising Murschel) #1

ES - 1.5
ES Hadoop plugin - 2.1.0 b4

{"error":"RepositoryVerificationException[[esprod-usagetracking-2015-06] path is not accessible on master node]; nested: IOException[Filesystem closed]; ","status":500}

The above is the error received when running a cronjob every other morning. That's the strange part, it works once and then the next run it fails out. So I only have snapshots for every other day. The snapshots do work, but again, only every other run. When I run the script over and over it returns the error every other run.

Has anyone seen this before?

(Costin Leau) #2

Weird. It looks like the FileSystem object used underneath by the plugin is affected somehow. What distro are you using? The plugin never closes the FileSystem not does it keep on creating a new one; however other Hadoop clients running on the same machine might interfere with it as the FileSystem relies on an internal cache that can be affected.
Do you have a bigger stacktrace (potentially from Elasticsearch itself)?
Can you double check whether there are other jobs interfering with Hadoop every other day? Anything that sticks out from the Hadoop logs?
Do you restart Elasticsearch by any chance?

(Tony Greising Murschel) #3

Running on CentOS 6.6. ES is the only thing running on this machine and is the only cronjob. ES is not restarted in between each snapshot creation, but it has been restarted before. Below is from the logs on 6/4:

[2015-06-04 08:45:12,011][INFO ][repositories             ] [esprod00] update repository [esprod-usagetracking-2015-06]
[2015-06-04 08:45:12,068][WARN ][snapshots                ] [esprod00] failed to create snapshot [esprod-usagetracking-2015-06:snapshot-2015-06-04]
org.elasticsearch.snapshots.SnapshotCreationException: [esprod-usagetracking-2015-06:snapshot-2015-06-04] failed to create snapshot
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.initializeSnapshot(
        at org.elasticsearch.snapshots.SnapshotsService.beginSnapshot(
        at org.elasticsearch.snapshots.SnapshotsService.access$600(
        at org.elasticsearch.snapshots.SnapshotsService$1$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(
        at org.apache.hadoop.hdfs.DFSClient.create(
        at org.apache.hadoop.hdfs.DFSClient.create(
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(
        at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(
        at org.apache.hadoop.fs.FileSystem.create(
        at org.apache.hadoop.fs.FileSystem.create(
        at org.apache.hadoop.fs.FileSystem.create(
        at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobContainer.createOutput(
        at org.elasticsearch.repositories.blobstore.BlobStoreRepository.initializeSnapshot(

(Costin Leau) #4

This is helpful. What Hadoop version/distro are you using?

P.S. Can you please use formatting - really simple but improves readability a lot. Thanks

(Costin Leau) #5

@tgreischel Hi
I've pushed a couple of updates which hopefully should fix your problem.

  1. the FileSystem instance is checked to see whether it's alive or not, so in case it is closed, a new one will be created.
  2. instead of using the typical API which relies on some Hadoop client caching (which can cause the FileSystem to be closed by other clients), a dedicated, private instance is now created instead which should be managed just by the plugin itself (though there is a shutdown hook that might close it, however see #1).

I have pushed a new dev build already in the repository - can you please try it out and let me know how it works for you. You shouldn't get the exception any more even for subsequent builds.


(Tony Greising Murschel) #6

That worked perfect! The backup ran great without error. Thanks again for the help!!

(Costin Leau) #7

Glad to hear it. Cheers!

(system) #8