Fscrawler does not index to ES with https

The following is the configuration on fscrawler:

name: "docscluster"
fs:
  url: "/etc/STORE/"
  update_rate: "2s"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: true
  add_filesize: true
  remove_deleted: false
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: true
  indexed_chars: "100%"
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
    #path: "F:\\Tesseract-OCR"
    #data_path: "F:\\Tesseract-OCR\\tessdata"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "https://SERVER1-HOSTNAME:9200"
  - url: "https://SERVER2-HOSTNAME:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"

Both ES nodes are running successfully in cluster.
When starting FSCrawler:

here are the logs:

10:25:09,657 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [112.3mb/1.7gb=6.34%], RAM [1.4gb/7.7gb=18.03%], Swap [1.4gb/1.9gb=74.66%].
10:25:13,630 INFO  [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.9.0
10:25:14,139 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
10:25:14,139 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
10:25:14,869 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [docscluster] for [/etc/STORE/] every [2s]

There is no further progress, neither in the logs nor in the ES nodes (no data).
The _cat/indices show:

yellow open docscluster_folder BkNAD1nRTU2WRmzMCcnzag 1 1   33101 2222148 194.9mb 194.9mb
yellow open docscluster        VcZI8N6iQ-eOpQI9U4QesA 1 1       0       0    208b    208b

What could be the issue please?

And you are able to manually index (e.g. with curl) documents using both of those URLs from the same machine you're running FSCrawler on? Just to rule out any network / firewall / auth issues.

Some comments.

update_rate: "2s"

That's a way too low. It means that FSCrawler will scan all the time the hard disk for new files. Not sure you want this. But that should not be an issue.

Try to start fscrawler with --restart option.
If it still does not work, try with --restart --debug and share the logs here.

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

This is working now. The issue was with internal firewall, port opening.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.