Getting repository verification exception while creating HDFS snapshot repository


#1

I have a single-node Hadoop machine running in OpenStack along with my 9-node (3 Masters, 3 Data, 3 Client) elasticsearch 2.2.0 cluster. My intention is to store periodic snapshots in the hdfs store of that single-node hadoop machine. The hadoop processes seem to be running fine:

ubuntu@hadoop-singlenode:~$ jps
16337 DataNode
11973 SecondaryNameNode
12299 NodeManager
12134 ResourceManager
11585 NameNode
17956 Jps

I can also curl to the datanode URL endpoint form the outside.

Here are the things I've done to set up the HDFS repository:
On all nodes,

  1. Added security.manager.enabled: false in elasticsearch.yml

  2. Configured the repository propertiesin elasticsearch.yml as:

    repositories.hdfs.uri: "hdfs://192.168.10.206:9000/"
    repositories.hdfs.path: "snaps"
    repositories.hdfs.load_defaults: "true"
    repositories.hdfs.conf_location: ["/etc/elasticsearch/core-site.xml","/etc/elasticsearch/hdfs-site.xml"] #copied these from my hadoop node
    repositories.hdfs.concurrent_streams: 5
    repositories.hdfs.compress: "true"
    repositories.hdfs.chunk_size: "10mb"

  3. Installed the plugin with bin/plugin install elasticsearch/elasticsearch-repository-hdfs/2.2.0

  4. Added /etc/elasticsearch/core-site.xml (copied from the hadoop machine):

<configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://192.168.10.206:9000</value>
    </property>
</configuration>
  1. Added /etc/elasticsearch/hdfs-site.xml (copied from the hadoop machine):
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>192.168.10.206:50070</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
    </property>
</configuration>
  1. Restarted all nodes one-by-one

  2. Fired the repository creation API with following json:

{
  "type": "hdfs",
  "settings": {
    "url": "hdfs://192.168.10.206:9000",
    "path": "snaps",
    "conf_location": ["/etc/elasticsearch/core-site.xml","/etc/elasticsearch/hdfs-site.xml"]
  }
}

Questions:

  1. Without doing anything else, I get an i_o_exception saying Mkdirs failed to create file:/usr/share/elasticsearch/snaps/tests-SEbzlD19QSa9XYPL3A5aeA. Why is elasticsearch trying to create these test files in local filesystem? Shouldn't it be creating them in the hdfs store I provided in the JSON above?
  2. I created a /usr/share/elasticsearch/snaps/ directory with 777 permissions. Now I get repository_verification_exception. Here is the response I get (semi-formatted for readability):
[
[hdfs_repo12]
 [
[aZmOht1qQEGtbDkfujD7sw, 'RemoteTransportException[
[[
[master1]
[
[192.168.10.227:9300]
[
[internal:admin/repository/verify]
]
; nested: RepositoryVerificationException[
[[
[hdfs_repo12]
 a file written by master to the store [
[file:/usr/share/elasticsearch/snaps]
 cannot be accessed on the node [
[{master1}{aZmOht1qQEGtbDkfujD7sw}{192.168.10.227}{192.168.10.227:9300}{data=false, master=true}]
. This might indicate that the store [
[file:/usr/share/elasticsearch/snaps]
 is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]
;']
, [
[KLQe6APPSEWBRua0_L0eEw, 'RemoteTransportException[
[[
[data2]
[
[192.168.10.231:9300]
[
[internal:admin/repository/verify]
]
; nested: RepositoryVerificationException[
[[
[hdfs_repo12]
 a file written by master to the store [
[file:/usr/share/elasticsearch/snaps]
 cannot be accessed on the node [
[{data2}{KLQe6APPSEWBRua0_L0eEw}{192.168.10.231}{192.168.10.231:9300}{master=false}]
. This might indicate that the store [
[file:/usr/share/elasticsearch/snaps]
 is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]
;']
, [
[MycDJAiRTwaLsG4hvh_sGw, 'RemoteTransportException[
[[
[data3]
[
[192.168.10.232:9300]
[
[internal:admin/repository/verify]
]
; nested: RepositoryVerificationException[
[[
[hdfs_repo12]
 a file written by master to the store [
[file:/usr/share/elasticsearch/snaps]
 cannot be accessed on the node [
[{data3}{MycDJAiRTwaLsG4hvh_sGw}{192.168.10.232}{192.168.10.232:9300}{master=false}]
. This might indicate that the store [
[file:/usr/share/elasticsearch/snaps]
 is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]
;']
, [
[IK4RltZ2ToqEQMcxpWOw1w, 'RemoteTransportException[
[[
[data1]
[
.
.
.
(truncated)

What am I doing wrong?


(Saurabh Goyal) #2

I am also facing the same issue.


#3

I reinstalled the plugin and did it again. Now I get this as a response:

{
  "error": {
    "root_cause": [
      {
        "type": "i_o_exception",
        "reason": "error=13, Permission denied"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[hdfs_repo16] path  is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Cannot run program \"chmod\": error=13, Permission denied",
      "caused_by": {
        "type": "i_o_exception",
        "reason": "error=13, Permission denied"
      }
    }
  },
  "status": 500
}

The relevant error block in the logs is as follows:

[2016-04-08 06:09:29,542][INFO ][rest.suppressed          ] /_snapshot/hdfs_repo16 Params: {wait_for_timeout=true, repository=hdfs_repo16}
RemoteTransportException[[master2][192.168.10.228:9300][cluster:admin/repository/put]]; nested: RepositoryVerificationException[[hdfs_repo16] path  is not accessible on master node]; nested: NotSerializableExceptionWrapper[Cannot run program "chmod": error=13, Permission denied]; nested: NotSerializableExceptionWrapper[error=13, Permission denied];
Caused by: RepositoryVerificationException[[hdfs_repo16] path  is not accessible on master node]; nested: NotSerializableExceptionWrapper[Cannot run program "chmod": error=13, Permission denied]; nested: NotSerializableExceptionWrapper[error=13, Permission denied];
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:650)
	at org.elasticsearch.repositories.RepositoriesService.verifyRepository(RepositoriesService.java:211)
	at org.elasticsearch.repositories.RepositoriesService$VerifyingRegisterRepositoryListener.onResponse(RepositoriesService.java:436)
	at org.elasticsearch.repositories.RepositoriesService$VerifyingRegisterRepositoryListener.onResponse(RepositoriesService.java:421)
	at org.elasticsearch.cluster.AckedClusterStateUpdateTask.onAllNodesAcked(AckedClusterStateUpdateTask.java:63)
	at org.elasticsearch.cluster.service.InternalClusterService$SafeAckedClusterStateTaskListener.onAllNodesAcked(InternalClusterService.java:723)
	at org.elasticsearch.cluster.service.InternalClusterService$AckCountDownListener.onNodeAck(InternalClusterService.java:1003)
	at org.elasticsearch.cluster.service.InternalClusterService$DelegetingAckListener.onNodeAck(InternalClusterService.java:942)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:627)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:762)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: NotSerializableExceptionWrapper[Cannot run program "chmod": error=13, Permission denied]; nested: NotSerializableExceptionWrapper[error=13, Permission denied];
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
	at org.apache.hadoop.util.Shell.run(Shell.java:456)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:728)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)


(Costin Leau) #4
  1. Likely you have some other snapshot process happening. This is likely indicated by /snaps/tests
  2. Looks like a permission error - do you have seccomp installed by any chance (double check the ES logs when it starts up). Can the ES user invoke chmod?

#5

Hi @costin, thanks for the reply. I'm confident that there isn't any snapshot operation in progress, because I couldn't create any repository successfully. To check anyway, I hit each repository's /_snapshot/<repo_name/_status API, and they all return with this JSON:

{
  "snapshots": []
}

My nodes don't find the binary seccomp when I try to execute it, so I don't seem to have that installed. Anything particular to look for in ES logs? (The search "seccomp" doesn't give any results.)

Any idea on how do I check if ES can invoke `chmod`? I tried this: [root@master2 elasticsearch]# su elasticsearch This account is currently not available. On trying this with normal user it prompts me for a password, but I haven't explicitly configured any password for the elasticsearch linux user.

Plus, chmod does have the execute permissions for all users:

[root@master2 elasticsearch]# ll /bin/chmod
-rwxr-xr-x. 1 root root 58544 Sep 15  2015 /bin/chmod

The user elasticsearch can execute chmod; I did the following to check that:

[root@master2 elasticsearch]# chsh -s /bin/bash elasticsearch
Changing shell for elasticsearch.
Shell changed.
[root@dz-esds-master2 elasticsearch]# su elasticsearch
bash-4.2$ touch /tmp/thing
bash-4.2$ chmod 777 /tmp/thing
bash-4.2$ echo $?
0
bash-4.2$ 

(Sundar Rajan) #6

Just FYI, this discussion has moved to a support ticket. We will update the community on outcome once ticket is resolved.


#7

Hi Everyone, the problem was in the JSON request data. The setting

"url": "hdfs://192.168.10.206:9000",

should have been

"uri": "hdfs://192.168.10.206:9000",

Because the JSON had URL instead of URI, elasticsearch was trying to create the repository locally.


(system) #8