LaaKii
(Laa Kii)
March 11, 2020, 9:20am
1
I'm using FSCrawler by @dadoonet and im trying to index files on a remote server.
The ssh connection is established, but it can't find the path. I tried all variants I could think of in the URL String.
My config looks like this:
name: "dev_binary"
fs:
url: "D:\\TestData"
server:
hostname: "host"
protocol: "ssh"
username: "user"
password: "pw"
update_rate: "15m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://127.0.0.1:9200"
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"
Any idea or example on how to set the path properly?
I'm on a Windows machine.
dadoonet
(David Pilato)
March 11, 2020, 10:37am
2
I believe that URL should be something like url: "/TestData"
.
To find the right URL, you should use a SSH client and run something like:
ssh host
Once connected, try to find the right dir.
cd /
ls
Or
cd /TestData
Once you have the right dir, use that name in the url
field.
LaaKii
(Laa Kii)
March 11, 2020, 10:44am
3
Could it be a problem, that when I connect I'm in
C:\User\user
And I have to use Data from D:
?
I looked up if the Folder exists and i can cd
to it:
user@host C:\Users\user>cd /
user@host C:\>D:
user@host D:\>cd TestData
user@host D:\TestData>
I believe that URL should be something like url: "/TestData"
.
It doesn't work when I use this. Note that I'm on a Windows Server...
dadoonet
(David Pilato)
March 11, 2020, 11:03am
4
Could you try to run (using ssh):
cd d:/TestData
ls
LaaKii
(Laa Kii)
March 11, 2020, 11:06am
5
Sure, the result:
user@host C:\Users\user>cd d:/TestData
user@host C:\Users\user>cd D:/TestData
It won't switch the directory. I used PuTTy as SSH client.
Am I missing something?
dadoonet
(David Pilato)
March 11, 2020, 11:08am
6
LaaKii:
Am I missing something?
No. I believe this is an issue with Windows filesystem. Let me think about it and come back to you.
LaaKii
(Laa Kii)
March 11, 2020, 11:09am
7
That's what I thought as well since you have to switch directories by just using the drive name.
Thank you really much for your effort.
LaaKii
(Laa Kii)
March 11, 2020, 11:10am
9
cd /d d:/TestData
That worked! I'm now in the correct directory
dadoonet
(David Pilato)
March 11, 2020, 11:11am
10
So I need to look if I can use that switch in FSCrawler. Let me check.
You tried already to use "url": "d:/TestData"
, right?
If not, could you try it.
LaaKii
(Laa Kii)
March 11, 2020, 11:14am
11
Yes, I just rerun it:
12:14:17,579 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to xxx
12:14:21,954 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
12:13:03,000 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling d:/TestData: d:/TestData doesn't exists.
12:13:03,000 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: d:/TestData doesn't exists.
Does it matter, that url
isn't in quotes in my config?
dadoonet
(David Pilato)
March 11, 2020, 11:16am
12
No. Sorry. My bad. I was still thinking it was json but I switched the settings to yaml.
Last question (may be ). Could you run:
ls d:/TestData
LaaKii
(Laa Kii)
March 11, 2020, 11:18am
13
Last question (may be ). Could you run:
ls d:/TestData
No problems in asking, I'm happy you help me with my problem
I can't execute ls
commands in PuTTY, since I'm on a windows machine
dadoonet
(David Pilato)
March 11, 2020, 11:20am
14
I meant. Can you ssh to the machine and run that command from the SSH Client?
LaaKii
(Laa Kii)
March 11, 2020, 11:22am
15
I did that, but I can't execute ls
in the SSH Client:
user@host C:\Users\user>ls d:/TestData
'ls' is not recognized as an internal or external command,
operable program or batch file.
user@host C:\Users\user>
Oh maybe to clarify: The remote server is also a windows machine.
dadoonet
(David Pilato)
March 11, 2020, 11:29am
16
Oh maybe to clarify: The remote server is also a windows machine.
Yeah. I got that.
Sounds like I need to start a windows VM and add a patch to the code.
Could you open an issue in Issues · dadoonet/fscrawler · GitHub ?
LaaKii
(Laa Kii)
March 11, 2020, 11:30am
17
Could you open an issue in Issues · dadoonet/fscrawler · GitHub
Feature or Bug ? Sure I can
1 Like
LaaKii
(Laa Kii)
March 11, 2020, 12:03pm
19
I opened it. If you want me to add extra details to the issue I surely can. I mainly linked to this post here.
Can you estimate how much effort it will be to change this, or even when it will work?
Thank you so much
dadoonet
(David Pilato)
March 11, 2020, 12:23pm
20
I don't know. I first need to have a VM running with an ssh server on it. WIP.
But I'm on it.
1 Like