We have been having an interesting issue with the repository-hdfs plugin on our Elasticsearch cluster. It seems as though the plugin security policy is not "sticking" - that is the best way I can describe this confusing behavior.
What happens is, we start the Elasticsearch cluster. Then, we create a new repository:
curl -XPUT --data-binary @- localhost:9200/_snapshot/new_repo <<JSON
{
"type": "hdfs",
"settings": {
"uri": "hdfs://hdfs-host.example.com:8020/",
"path": "/path/to/snapshot/dir"
"conf_location": "/path/to/hdfs-site.xml"
}
}
JSON
Then we attempt to verify it:
curl -XPOST localhost:9200/_snapshot/new_repo/_verify
The first time we run this on our node, we get an acknowledgement from Elasticsearch and the repository verification also succeeds. However, creating a snapshot subsequently fails with this error:
2017-07-12_19:00:49.25873 [2017-07-12T19:00:49,211][WARN ][r.suppressed ] path: /_snapshot/testrepo/snapshot1, params: {repository=testrepo, wait_for_completion=true, snapshot=snapshot1}
2017-07-12_19:00:49.25876 org.elasticsearch.repositories.RepositoryException: [testrepo] could not read repository data from index blob
2017-07-12_19:00:49.25876 at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:800) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25876 at org.elasticsearch.snapshots.SnapshotsService.createSnapshot(SnapshotsService.java:236) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25876 at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:82) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25877 at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:41) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25877 at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25878 at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25878 at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25878 at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.3.jar:5.4.3]
2017-07-12_19:00:49.25878 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
2017-07-12_19:00:49.25879 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
2017-07-12_19:00:49.25879 at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
2017-07-12_19:00:49.25879 Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.security.AccessControlException: access denied ("javax.security.auth.PrivateCredentialPermission" "org.apache.hadoop.security.Credentials" "read")
2017-07-12_19:00:49.25879 at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) ~[?:?]
2017-07-12_19:00:49.25880 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:580) ~[?:?]
2017-07-12_19:00:49.25880 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2017-07-12_19:00:49.25880 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
2017-07-12_19:00:49.25881 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2017-07-12_19:00:49.25881 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
2017-07-12_19:00:49.25881 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[?:?]
2017-07-12_19:00:49.25881 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[?:?]
2017-07-12_19:00:49.25881 at com.sun.proxy.$Proxy34.getListing(Unknown Source) ~[?:?]
2017-07-12_19:00:49.25882 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2094) ~[?:?]
2017-07-12_19:00:49.25883 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2077) ~[?:?]
2017-07-12_19:00:49.25883 at org.apache.hadoop.fs.Hdfs.listStatus(Hdfs.java:254) ~[?:?]
2017-07-12_19:00:49.25883 at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1798) ~[?:?]
2017-07-12_19:00:49.25884 at org.apache.hadoop.fs.FileContext$Util$1.next(FileContext.java:1794) ~[?:?]
2017-07-12_19:00:49.25884 at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) ~[?:?]
2017-07-12_19:00:49.25884 at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1800) ~[?:?]
2017-07-12_19:00:49.25884 at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1759) ~[?:?]
2017-07-12_19:00:49.25884 at org.apache.hadoop.fs.FileContext$Util.listStatus(FileContext.java:1718) ~[?:?]
2017-07-12_19:00:49.25885 at org.elasticsearch.repositories.hdfs.HdfsBlobContainer$6.run(HdfsBlobContainer.java:145) ~[?:?]
2017-07-12_19:00:49.25885 at org.elasticsearch.repositories.hdfs.HdfsBlobContainer$6.run(HdfsBlobContainer.java:142) ~[?:?]
2017-07-12_19:00:49.25885 at org.elasticsearch.repositories.hdfs.HdfsBlobStore.lambda$execute$0(HdfsBlobStore.java:132) ~[?:?]
2017-07-12_19:00:49.25885 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112]
2017-07-12_19:00:49.25885 at java.security.AccessController.doPrivileged(AccessController.java:713) ~[?:1.8.0_112]
...
Furthermore, if we restart Elasticsearch, repository verification fails with a similar security exception.
The only way I've found to fix these errors is to manually specify an override to point the java process directly to the security policy file, via -Djava.security.policy=$ES_HOME/plugins/repository-hdfs/plugin-security.policy
... which seems wrong.
After looking through the code, it looks like these policy files should be automatically loaded and applied when using the plugins. And, I have been unable to find other users who are experiencing the same error, which makes me think that there is something peculiar with our particular setup or environment, but I haven't found anything that seems fishy.
Has anybody else experienced a similar error? Are any of my assumptions about how the plugin architecture works incorrect? And, what else could I provide to help debug this particular error?
Thanks very much in advance.