[ANNOUNCEMENT] - Elasticsearch File System Crawler 2.3 released

The Elasticsearch File System Crawler team is pleased to announce the fscrawler-2.3 release!

FS Crawler offers a simple way to index local files into elasticsearch.

Changes in this version include:

New features:

  • fixed JSON, missing comma added Issue: 386. Thanks to Quix0r.
  • Add OCR support for PDF documents Issue: 373. Thanks to dadoonet.
  • Add bootstrap checks Issue: 371. Thanks to dadoonet.
  • Error while indexing content from [path]: entity content is too long [134161442] for the configured buffer limit [104857600] Issue: 345. Thanks to candido1212.
  • Add optional Jansi Library Issue: 332. Thanks to dadoonet.
  • Add continue_on_error option to continue on error while crawling Issue: 330. Thanks to kneubi.
  • Fix links typo Issue: 326. Thanks to soruly.
  • fscrawler doesn't work on windows Issue: 289. Thanks to xsallowed.

Fixed Bugs:

  • Ingest pipeline should be applied to doc index only Issue: 395. Thanks to dadoonet.
  • Not setting updateRate can cause NPE Issue: 382. Thanks to dadoonet.
  • Content type detection for rest requests is deprecated when using fscrawler 2.3-SNAPSHOT Issue: 380. Thanks to zekefromde.
  • Inconsistent use of 'path' real properties for folders and files Issue: 353. Thanks to trorbyte.
  • Folder mapping has name field that is never used Issue: 352. Thanks to trorbyte.
  • Timestamp "last modified" and "indexed time" shifted + 2h Issue: 350. Thanks to TheReal1604.
  • filename_as_id=true leads to deleted files not getting deleted from the index Issue: 336. Thanks to shadiakiki1986.
  • elasticsearch.password and server.password settings are not read Issue: 329. Thanks to dadoonet.
  • Windows: Error in character encoding in logs Issue: 325. Thanks to BlacksWise.
  • Patch Log4J 2.8 to display messages on Windows Issue: 323. Thanks to dadoonet.
  • fscrawler.bat does not have CR-LF and uses %$JAVA_OPTS% instead of %JAVA_OPTS% Issue: 322. Thanks to dadoonet.
  • Windows: fscrawler seems to hang Issue: 320. Thanks to BlacksWise.
  • file.extension missing if index_content is set to false Issue: 317. Thanks to kneubi.
  • [SSH] Error while indexing content from /home/administrateur : Auth fail Issue: 316. Thanks to Moltroon.
  • tesseract usage for OCR Issue: 314. Thanks to shadiakiki1986.
  • elasticsearch.password option is not read Issue: 312. Thanks to hlecorche.

Changes:

  • Update to REST Client to 5.5.0 Issue: 403. Thanks to dadoonet.
  • Update compatibility to elasticsearch 6.0.0-alpha2 (1st phase) Issue: 384. Thanks to dadoonet.
  • Update to 6.0.0-alpha2 Issue: 383. Thanks to dadoonet.
  • Update to Apache Tika 1.15 Issue: 378. Thanks to dadoonet.
  • Remove path.encoded field Issue: 366. Thanks to dadoonet.
  • Update to Elasticsearch 5.4.0 Issue: 365. Thanks to dadoonet.
  • Error message when crawling is not correctly worded Issue: 364. Thanks to dadoonet.
  • Index file metadata with not enough read rights on a file Issue: 362. Thanks to TheReal1604.
  • use StringBuilder in a loop Issue: 361. Thanks to ctamisier.
  • Update to elasticsearch 5.3.0 Issue: 355. Thanks to dadoonet.
  • fscrawler requests are deprecated by elasticsearch Issue: 354. Thanks to xsallowed.
  • add fscrawler_path to the virtual and real paths on folders Issue: 342. Thanks to trorbyte.
  • Avoid bulk queue rejection Issue: 339. Thanks to babadofar.
  • Update to Elasticsearch 5.2.2 Issue: 338. Thanks to dadoonet.
  • Detect if java can't be find (on windows) Issue: 333. Thanks to dadoonet.
  • Update to Log4J 2.8.1 Issue: 324. Thanks to dadoonet.
  • move file extension code from TikaDocParser to FsCrawlerImpl Issue: 318. Thanks to kneubi.

For a manual installation, you can download the fscrawler-2.3 here:
https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler/2.3/

Have fun!
-Elasticsearch File System Crawler team

2 Likes

Hi. is this an official elasticSearch solution? or community? is it oart of the stack? thanks

It's a community project I started in 2011.