FSCrawler path to remote dir

So I need to look if I can use that switch in FSCrawler. Let me check.

You tried already to use "url": "d:/TestData", right?
If not, could you try it.

Yes, I just rerun it:

12:14:17,579 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to xxx
12:14:21,954 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling d:/TestData: d:/TestData doesn't exists.
    12:13:03,000 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
    java.lang.RuntimeException: d:/TestData doesn't exists.

Does it matter, that url isn't in quotes in my config?

No. Sorry. My bad. I was still thinking it was json but I switched the settings to yaml.

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

Last question (may be :slight_smile: ). Could you run:

ls d:/TestData

No problems in asking, I'm happy you help me with my problem :slight_smile:

I can't execute ls commands in PuTTY, since I'm on a windows machine :confused:

I meant. Can you ssh to the machine and run that command from the SSH Client?

I did that, but I can't execute ls in the SSH Client:

user@host C:\Users\user>ls d:/TestData
'ls' is not recognized as an internal or external command,
operable program or batch file.

user@host C:\Users\user>

Oh maybe to clarify: The remote server is also a windows machine.

Oh maybe to clarify: The remote server is also a windows machine.

Yeah. I got that.
Sounds like I need to start a windows VM and add a patch to the code.

Could you open an issue in https://github.com/dadoonet/fscrawler/issues?

Could you open an issue in https://github.com/dadoonet/fscrawler/issues

Feature or Bug ? Sure I can :slight_smile:

1 Like

It's a bug to me.

I opened it. If you want me to add extra details to the issue I surely can. I mainly linked to this post here.

Can you estimate how much effort it will be to change this, or even when it will work?
Thank you so much :slight_smile:

I don't know. I first need to have a VM running with an ssh server on it. WIP.
But I'm on it. :wink:

1 Like

I think there is nothing to fix but add a better documentation...

Could you try with:

name: "dev_binary"
  url: "/D:/TestData"
  hostname: "host"
  protocol: "ssh"
  username: "user"
  password: "pw"
  update_rate: "15m"
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
  - url: ""
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
1 Like

Thank you really much! Should I update the documentation and submit a pull request?

Working perfect now.

1 Like

But somehow if I change the update_rate, it remains on 15 minutes after restart?
Is my config in the correct format? Seems like update_rate is assigned to server.

I'm on it :wink:

update_rate is just the pause duration between two internal runs once FSCrawler has started.

Yes, that's what I thought too.
After setting it to 1m, should it start every 1m?

15:12:15,171 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [dev_binary] for [/D:/TestData] every [15m]

It will start a new run after it has finished the current run and with a delay of 1 minute.

1 Like

Aaaaah ok. Is there a way to set a schedule on when FSCrawler should look after new / edited files?
I thought thats the way to go :slight_smile:

You can put FSCrawler in a crontab I guess and run it with the option --loop 1. It will exit after one run.

See https://fscrawler.readthedocs.io/en/latest/admin/cli-options.html#loop

1 Like