FSCrawler giving File does not exist issue

Hi,

I am trying to work on indexing attachments through fscrwaler using SSH (22).

I have elastic search installed on one machine & trying to read through data from other machine in the same network, but getting a file does not exist error.

This is my settings.yaml

---
name: "new_attachment"
server:
  hostname: "**************"
  port: 22
  username: "**********"
  password: "**************"
  protocol: "ssh"
fs:
  url: "\\E\\foldertoindex"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "http://*************:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"

Error is as below:

15:21:50,215 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV7] wait for yellow health
on index [new_attachment_folder]
15:21:50,223 TRACE [f.p.e.c.f.c.v.ElasticsearchClientV7] health response: {"clus
ter_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":3
,"number_of_data_nodes":3,"active_primary_shards":1,"active_shards":2,"relocatin
g_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_sh
ards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_wait
ing_in_queue_millis":0,"active_shards_percent_as_number":100.0}
15:21:50,227 DEBUG [f.p.e.c.f.FsParserAbstract] creating fs crawler thread [new_
attachment] for [E\foldertoindex] every [15m]
15:21:50,231 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [new_atta
chment] for [E\foldertoindex] every [15m]
15:21:50,231 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler thread [new_attachmen
t] is now running. Run #1...
15:21:50,231 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to ******@***********
15:21:50,815 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
15:21:50,819 WARN  [f.p.e.c.f.FsParserAbstract] Error while crawling E:\foldertoindex: E:\foldertoindex
doesn't exists.
15:21:50,819 WARN  [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.RuntimeException: E:\foldertoindex doesn't
 exists.
        at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstr
act.java:130) [fscrawler-core-2.7-SNAPSHOT.jar:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
15:21:50,827 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for
 15m
15:22:04,786 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [new_attachment]

15:22:04,786 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is still running
15:22:04,786 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is now waking up agai
n...
15:22:04,786 DEBUG [f.p.e.c.f.FsParserAbstract] FS crawler thread [new_attachmen
t] is now marked as closed...
java.lang.Exception: Stack trace
        at java.base/java.lang.Thread.dumpStack(Thread.java:1387)
        at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.close(FsCrawlerImpl.
java:140)
        at fr.pilato.elasticsearch.crawler.fs.cli.FSCrawlerShutdownHook.run(FSCr
awlerShutdownHook.java:39)
Terminate batch job (Y/N)? Y

Regards,
Umesh

Have a look at https://fscrawler.readthedocs.io/en/latest/admin/fs/ssh.html#windows-drives

The URL should be:

url: "/E:/foldertoindex"
1 Like

Hi David,

Thanks the solution did work, but I am getting the following error now.

16:59:20,460 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [new_atta
chment] for [/E:/foldertoindex] every [15m]
16:59:20,460 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler thread [new_attachmen
t] is now running. Run #1...
16:59:20,464 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] Opening SSH connection to D
IR\RGadge@NL0123EPC0921.dir.slb.com
16:59:21,225 DEBUG [f.p.e.c.f.c.s.FileAbstractorSSH] SSH connection successful
16:59:21,269 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling /E:/foldert
oindex: begin 0, end -1, length 17
16:59:21,273 WARN [f.p.e.c.f.FsParserAbstract] Full stacktrace
java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 17
at java.lang.String.checkBoundsBeginEnd(String.java:3319) ~[?:?]
at java.lang.String.substring(String.java:1874) ~[?:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.indexDirectory(Fs
ParserAbstract.java:541) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParserAbstract.run(FsParserAbstr
act.java:142) [fscrawler-core-2.7-SNAPSHOT.jar:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
16:59:21,281 DEBUG [f.p.e.c.f.FsParserAbstract] Fs crawler is going to sleep for
15m

Regards,
Umesh

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

It looks like a bug. Could you share the exact configuration file? You can remove credentials from it.

Could you open an issue with those details?

Hi David,

Thank you I will open a new issue with the details.

Regards,
Umesh

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.