The Elasticsearch File System Crawler team is pleased to announce the fscrawler-2.3 release!
FS Crawler offers a simple way to index local files into elasticsearch.
Changes in this version include:
New features:
- fixed JSON, missing comma added Issue: 386. Thanks to Quix0r.
- Add OCR support for PDF documents Issue: 373. Thanks to dadoonet.
- Add bootstrap checks Issue: 371. Thanks to dadoonet.
- Error while indexing content from [path]: entity content is too long [134161442] for the configured buffer limit [104857600] Issue: 345. Thanks to candido1212.
- Add optional Jansi Library Issue: 332. Thanks to dadoonet.
- Add continue_on_error option to continue on error while crawling Issue: 330. Thanks to kneubi.
- Fix links typo Issue: 326. Thanks to soruly.
- fscrawler doesn't work on windows Issue: 289. Thanks to xsallowed.
Fixed Bugs:
- Ingest pipeline should be applied to doc index only Issue: 395. Thanks to dadoonet.
- Not setting updateRate can cause NPE Issue: 382. Thanks to dadoonet.
- Content type detection for rest requests is deprecated when using fscrawler 2.3-SNAPSHOT Issue: 380. Thanks to zekefromde.
- Inconsistent use of 'path' real properties for folders and files Issue: 353. Thanks to trorbyte.
- Folder mapping has name field that is never used Issue: 352. Thanks to trorbyte.
- Timestamp "last modified" and "indexed time" shifted + 2h Issue: 350. Thanks to TheReal1604.
- filename_as_id=true leads to deleted files not getting deleted from the index Issue: 336. Thanks to shadiakiki1986.
-
elasticsearch.password
andserver.password
settings are not read Issue: 329. Thanks to dadoonet. - Windows: Error in character encoding in logs Issue: 325. Thanks to BlacksWise.
- Patch Log4J 2.8 to display messages on Windows Issue: 323. Thanks to dadoonet.
- fscrawler.bat does not have CR-LF and uses %$JAVA_OPTS% instead of %JAVA_OPTS% Issue: 322. Thanks to dadoonet.
- Windows: fscrawler seems to hang Issue: 320. Thanks to BlacksWise.
- file.extension missing if index_content is set to false Issue: 317. Thanks to kneubi.
- [SSH] Error while indexing content from /home/administrateur : Auth fail Issue: 316. Thanks to Moltroon.
- tesseract usage for OCR Issue: 314. Thanks to shadiakiki1986.
- elasticsearch.password option is not read Issue: 312. Thanks to hlecorche.
Changes:
- Update to REST Client to 5.5.0 Issue: 403. Thanks to dadoonet.
- Update compatibility to elasticsearch 6.0.0-alpha2 (1st phase) Issue: 384. Thanks to dadoonet.
- Update to 6.0.0-alpha2 Issue: 383. Thanks to dadoonet.
- Update to Apache Tika 1.15 Issue: 378. Thanks to dadoonet.
- Remove path.encoded field Issue: 366. Thanks to dadoonet.
- Update to Elasticsearch 5.4.0 Issue: 365. Thanks to dadoonet.
- Error message when crawling is not correctly worded Issue: 364. Thanks to dadoonet.
- Index file metadata with not enough read rights on a file Issue: 362. Thanks to TheReal1604.
- use StringBuilder in a loop Issue: 361. Thanks to ctamisier.
- Update to elasticsearch 5.3.0 Issue: 355. Thanks to dadoonet.
- fscrawler requests are deprecated by elasticsearch Issue: 354. Thanks to xsallowed.
- add fscrawler_path to the virtual and real paths on folders Issue: 342. Thanks to trorbyte.
- Avoid bulk queue rejection Issue: 339. Thanks to babadofar.
- Update to Elasticsearch 5.2.2 Issue: 338. Thanks to dadoonet.
- Detect if java can't be find (on windows) Issue: 333. Thanks to dadoonet.
- Update to Log4J 2.8.1 Issue: 324. Thanks to dadoonet.
- move file extension code from TikaDocParser to FsCrawlerImpl Issue: 318. Thanks to kneubi.
For a manual installation, you can download the fscrawler-2.3 here:
https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler/2.3/
Have fun!
-Elasticsearch File System Crawler team