FSCrawler path to remote dir

I'm using FSCrawler by @dadoonet and im trying to index files on a remote server.
The ssh connection is established, but it can't find the path. I tried all variants I could think of in the URL String.

My config looks like this:

name: "dev_binary"
fs:
  url: "D:\\TestData"
server:
  hostname: "host"
  protocol: "ssh"
  username: "user"
  password: "pw"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "http://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"

Any idea or example on how to set the path properly?

I'm on a Windows machine.

I believe that URL should be something like url: "/TestData".

To find the right URL, you should use a SSH client and run something like:

ssh host

Once connected, try to find the right dir.

cd /
ls

Or

cd /TestData

Once you have the right dir, use that name in the url field.

Could it be a problem, that when I connect I'm in

C:\User\user

And I have to use Data from D:?

I looked up if the Folder exists and i can cd to it:

user@host C:\Users\user>cd /

user@host C:\>D:

user@host D:\>cd TestData

user@host D:\TestData>

I believe that URL should be something like url: "/TestData" .

It doesn't work when I use this. Note that I'm on a Windows Server...

Could you try to run (using ssh):

cd d:/TestData
ls

Sure, the result:

user@host C:\Users\user>cd d:/TestData

user@host C:\Users\user>cd D:/TestData

It won't switch the directory. I used PuTTy as SSH client.
Am I missing something?

No. I believe this is an issue with Windows filesystem. Let me think about it and come back to you.

That's what I thought as well since you have to switch directories by just using the drive name.

Thank you really much for your effort.

Could you try:

cd /d d:/TestData

cd /d d:/TestData

That worked! I'm now in the correct directory

So I need to look if I can use that switch in FSCrawler. Let me check.

You tried already to use "url": "d:/TestData", right?
If not, could you try it.

Yes, I just rerun it:

12:14:17,579 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to xxx
12:14:21,954 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling d:/TestData: d:/TestData doesn't exists.
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
    java.lang.RuntimeException: d:/TestData doesn't exists.

Does it matter, that url isn't in quotes in my config?

No. Sorry. My bad. I was still thinking it was json but I switched the settings to yaml.

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

No problems in asking, I'm happy you help me with my problem :slight_smile:

I can't execute ls commands in PuTTY, since I'm on a windows machine :confused:

I meant. Can you ssh to the machine and run that command from the SSH Client?

I did that, but I can't execute ls in the SSH Client:

user@host C:\Users\user>ls d:/TestData
'ls' is not recognized as an internal or external command,
operable program or batch file.

user@host C:\Users\user>

Oh maybe to clarify: The remote server is also a windows machine.

Oh maybe to clarify: The remote server is also a windows machine.

Yeah. I got that.
Sounds like I need to start a windows VM and add a patch to the code.

Could you open an issue in Issues · dadoonet/fscrawler · GitHub?

Could you open an issue in Issues · dadoonet/fscrawler · GitHub

Feature or Bug ? Sure I can :slight_smile:

1 Like

It's a bug to me.

I opened it. If you want me to add extra details to the issue I surely can. I mainly linked to this post here.

Can you estimate how much effort it will be to change this, or even when it will work?
Thank you so much :slight_smile:

I don't know. I first need to have a VM running with an ssh server on it. WIP.
But I'm on it. :wink:

1 Like