I'm using FSCrawler by @dadoonet and I'm trying to connect directory with ssh. I have ssh configured on my server so I can access a directory and use FSCrawler. The steps I have followed are as follows:
-
Create a public and private key with the following command (sizes I've used are 2048 and 4086):
ssh-keygen -t rsa -b 2048 -C sshserver@gtsi -f sshGTSI -
I have passed the private key to pem format with the following command:
ssh-keygen -e -p -f sshGTSI -m pem. -
I have added the public key to the server (ubuntu) in the file ~/.ssh/authorized_keys.
-
In this step I have followed the instructions in the documentation to be able to configure ssh with pem file. The main idea of the use of public and private keys is to avoid the use of passwords, so I think that in the documentation part of ssh together with pem file there would be a small error. If I follow the instructions in the documentation it connects correctly but if I remove the password option I get an authentication error.
15:46:25,677 WARN [f.p.e.c.f.c.s.FileAbstractorSSH] Cannot connect with SSH to userServer@#ipserver: Auth fail
15:46:25,677 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling /path/crawling/Document: Auth fail
15:46:25,677 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace
com.jcraft.jsch.JSchException: Auth fail
at com.jcraft.jsch.Session.connect(Session.java:519) ~[jsch-0.1.55.jar:?]
at com.jcraft.jsch.Session.connect(Session.java:183) ~[jsch-0.1.55.jar:?]
at fr.pilato.elasticsearch.crawler.fs.crawler.ssh.FileAbstractorSSH.openSSHConnection(FileAbstractorSSH.java:144) ~[fscrawler-crawler-ssh-2.9.jar:?]
at fr.pilato.elasticsearch.crawler.fs.crawler.ssh.FileAbstractorSSH.open(FileAbstractorSSH.java:117) ~[fscrawler-crawler-ssh-2.9.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstract.java:130) [fscrawler-core-2.9.jar:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
15:46:25,682 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for 30s
This is my setting:
name: "ssh_test"
fs:
#url: "\\tmp\\es"
url: "/path/crawling/Documentos"
update_rate: "30s"
server:
hostname: "#ipserver"
username: "userserver"
# password: "passworduserserver"
pem_path: "C:\\Users\\user\\.ssh\\ssh_GTSI.pem"
protocol: "ssh"
port: 22
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://127.0.0.1:9200"
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"
ssl_verification: true
I tested the current ssh configuration in my terminal with the following command: ssh -i sshGTSI.pem userserver@#ipserver -p 22 the connection was successful
I have used two versions of the FSCrawler (2.10 and 2.9)