[ANNOUNCEMENT] - FSCrawler 2.6 released

The FSCrawler team is pleased to announce the FSCrawler 2.6 release!

FSCrawler

FS Crawler offers a simple way to index binary files into elasticsearch.

Usage

Download FSCrawler 2.6:

wget https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es6/2.6/fscrawler-es6-2.6.zip

Start FS crawler with:

bin/fscrawler job_name

FS crawler will read a local file (default to ~/.fscrawler/{job_name}/_settings.json).
If the file does not exist, FS crawler will propose to create your first job.

$ bin/fscrawler job_name
18:28:58,174 WARN  [f.p.e.c.f.FsCrawler] job [job_name] does not exist
18:28:58,177 INFO  [f.p.e.c.f.FsCrawler] Do you want to create it (Y/N)?
y
18:29:05,711 INFO  [f.p.e.c.f.FsCrawler] Settings have been created in [~/.fscrawler/job_name/_settings.json]. Please review and edit before relaunch

Create a directory named /tmp/es or c:\tmp\es, add some files you want to index in it and start again:

$ bin/fscrawler job_name
18:30:34,330 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
18:30:34,332 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
18:30:34,682 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [job_name] for [/tmp/es] every [15m]

More details in the documentation.

New features

  • #648: Add documentation on how to run as a Windows service . Thanks to dadoonet.
  • #633: Ignore dirs when .fscrawlerignore file is detected . Thanks to dadoonet.
  • #631: Support multiple OCR languages . Thanks to dadoonet.
  • #616: Create specific elasticsearch clients . Thanks to dadoonet.
  • #611: Add Release Drafter to automatically generate the release notes . Thanks to dadoonet.
  • #597: Add LGTM code quality badges . Thanks to xcorail.

Fixed Bugs

  • #658: Exit as soon as we close FSCrawler . Thanks to dadoonet.
  • #647: Add a warning when using both silent and debug/trace . Thanks to dadoonet.
  • #646: --silent with no job specify should warn the user . Thanks to dadoonet.
  • #610: Add a Noop Parser . Thanks to dadoonet.
  • #608: Can not start the REST Service standalone . Thanks to dadoonet.
  • #605: Can not stop FSCrawler while crawling the local FS . Thanks to dadoonet.
  • #595: Make default root dir Windows compatible . Thanks to dadoonet.
  • #593: Support XML reoccurring structures . Thanks to dadoonet.

Changes

  • #657: Update Jackson to 2.9.8 . Thanks to dadoonet.
  • #655: Update to Tika 1.20 . Thanks to dadoonet.
  • #649: Update to Elasticsearch 6.5.3 . Thanks to dadoonet.
  • #645: Update Guava transitive dependency to 27.0.1-jre . Thanks to dadoonet.
  • #644: Force the default number of shards to be 1 . Thanks to dadoonet.
  • #642: Check Elasticsearch 6 minor version . Thanks to dadoonet.
  • #638: Revisit Elasticsearch.Node and Rest settings . Thanks to dadoonet.
  • #637: Update to elasticsearch 6.5.1 . Thanks to dadoonet.
  • #624: Update Tika to 1.19.1 . Thanks to dadoonet.
  • #609: Dump stack when not able to close FSCrawler . Thanks to dadoonet.
  • #604: Update ossindex-maven-plugin to 3.0.1 . Thanks to dadoonet.
  • #603: Update to Tika 1.19 . Thanks to dadoonet.
  • #602: Update to Jackson 2.9.7 . Thanks to dadoonet.
  • #594: Update to Elasticsearch 6.4.1 . Thanks to dadoonet.

Have fun!
-FSCrawler team