[ANNOUNCEMENT] - FSCrawler 2.8 released

dadoonet · December 14, 2021, 6:31am

The FSCrawler team is pleased to announce the FSCrawler 2.8 release!

FSCrawler

FS Crawler offers a simple way to index binary files into Elasticsearch.

Usage

wget https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es7/2.8/fscrawler-es7-2.8.zip

Start FS crawler with:

bin/fscrawler job_name

FS crawler will read a local file (default to ~/.fscrawler/{job_name}/_settings.json).
If the file does not exist, FS crawler will propose to create your first job.

$ bin/fscrawler job_name
18:28:58,174 WARN  [f.p.e.c.f.FsCrawler] job [job_name] does not exist
18:28:58,177 INFO  [f.p.e.c.f.FsCrawler] Do you want to create it (Y/N)?
y
18:29:05,711 INFO  [f.p.e.c.f.FsCrawler] Settings have been created in [~/.fscrawler/job_name/_settings.json]. Please review and edit before relaunch

Create a directory named /tmp/es or c:\tmp\es, add some files you want to index in it and start again:

$ bin/fscrawler job_name
18:30:34,330 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
18:30:34,332 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
18:30:34,682 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [job_name] for [/tmp/es] every [15m]

New features

#1310: Update ocr.rst, the path was wrong and not working . Thanks to sahin52.
#1256: Add section Workaround for huge temporary files . Thanks to dfbm.

Fixed Bugs

#1286: Fix starting fscrawler with Docker . Thanks to dadoonet.
#1271: fix: not working optional libraries (e.g. jpeg2000) . Thanks to NickUfer.
#1252: Add procps apt package to container install . Thanks to cwperry.
#1229: File logs missing in docker container . Thanks to helsonxiao.

Changes

#1320: Bump log4j-core from 2.14.1 to 2.15.0 . Thanks to dependabot[bot].
#1198: Update to Tika 2.1 . Thanks to dadoonet.

Have fun!
-FSCrawler team

HaroldH · December 15, 2021, 11:10am

Hi,

are you sure? The folder https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es7/2.8/ does not exists when I browse there. And the latest version in Central Repository: fr/pilato/elasticsearch/crawler/fscrawler-es7 is still 2.7.

Regards,

dadoonet · December 15, 2021, 2:26pm

So it happens that it's more complicated to fix that problem than what I thought.
Thank you so much for reporting this.

In the meantime, the latest 2.8-SNAPSHOT version could be used.
The Docker image is ok.

HaroldH · December 21, 2021, 3:29pm

Hi David,

Tnx for the reply. Will there be a fix released in the near future? Or will this be part of a bigger release? I rather not run production with a snapshot version.