FSCrawler path to remote dir

Sure, the result:

user@host C:\Users\user>cd d:/TestData

user@host C:\Users\user>cd D:/TestData

It won't switch the directory. I used PuTTy as SSH client.
Am I missing something?

No. I believe this is an issue with Windows filesystem. Let me think about it and come back to you.

That's what I thought as well since you have to switch directories by just using the drive name.

Thank you really much for your effort.

Could you try:

cd /d d:/TestData

cd /d d:/TestData

That worked! I'm now in the correct directory

So I need to look if I can use that switch in FSCrawler. Let me check.

You tried already to use "url": "d:/TestData", right?
If not, could you try it.

Yes, I just rerun it:

12:14:17,579 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to xxx
12:14:21,954 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling d:/TestData: d:/TestData doesn't exists.
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
    java.lang.RuntimeException: d:/TestData doesn't exists.

Does it matter, that url isn't in quotes in my config?

No. Sorry. My bad. I was still thinking it was json but I switched the settings to yaml.

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

No problems in asking, I'm happy you help me with my problem :slight_smile:

I can't execute ls commands in PuTTY, since I'm on a windows machine :confused:

I meant. Can you ssh to the machine and run that command from the SSH Client?

I did that, but I can't execute ls in the SSH Client:

user@host C:\Users\user>ls d:/TestData
'ls' is not recognized as an internal or external command,
operable program or batch file.

user@host C:\Users\user>

Oh maybe to clarify: The remote server is also a windows machine.

Oh maybe to clarify: The remote server is also a windows machine.

Yeah. I got that.
Sounds like I need to start a windows VM and add a patch to the code.

Could you open an issue in https://github.com/dadoonet/fscrawler/issues?

Could you open an issue in https://github.com/dadoonet/fscrawler/issues

Feature or Bug ? Sure I can :slight_smile:

1 Like

It's a bug to me.

I opened it. If you want me to add extra details to the issue I surely can. I mainly linked to this post here.

Can you estimate how much effort it will be to change this, or even when it will work?
Thank you so much :slight_smile:

I don't know. I first need to have a VM running with an ssh server on it. WIP.
But I'm on it. :wink:

1 Like

I think there is nothing to fix but add a better documentation...

Could you try with:

name: "dev_binary"
fs:
  url: "/D:/TestData"
server:
  hostname: "host"
  protocol: "ssh"
  username: "user"
  password: "pw"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "http://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
1 Like

Thank you really much! Should I update the documentation and submit a pull request?

Working perfect now.

1 Like

But somehow if I change the update_rate, it remains on 15 minutes after restart?
Is my config in the correct format? Seems like update_rate is assigned to server.

I'm on it :wink: