[ANNOUNCEMENT] - Elasticsearch File System Crawler 2.1 released


(David Pilato) #1

The Elasticsearch File System Crawler team is pleased to announce the fscrawler-2.1 release!

FS Crawler offers a simple way to index local files into elasticsearch.

Changes in this version include:

New features:
o Index file hash/checksum
o Add a single integration test with all known formats
o Add Randomized testing framework
o Add test for XML without text
o Add tests for .doc, .html, .pdf and .rtf
o Add how-to release documentation
o Add test for wav files
o Add unit tests for Tika data extraction
o Add more extracted metadata from Tika
o Add support for elasticsearch 1.x series
o Split tests into unit tests and real integration tests
o Add support for multiple mapping templates depending on elasticsearch version
o Change project version to 2 digits: major.minor
o Let the user specify its own mapping file per job
o Copy automatically at startup resource files to fscrawler config directory
o Add support for elasticsearch 5.0
o Add support for attributes within SSH files
o Add attributes_support option
o Fix invalid index mapping in README.md
o Add owner & group of a file to ES index
o Add "how to download FS Crawler"
o Default REST elasticsearch port should be 9200 and not 9300
o Feature Request: Exclude and Include based on directory or full path name/URL
o add a test with a docx document
o Add FS_JAVA_OPTS JVM option
o Remove text related to rivers
o Add Travis CI
o Examples should be MD5 encoded and not base64 encoded
o Add a test which checks renamed files detection
o Store only metadata of files in a directory tree

Fixed Bugs:
o NullPointerException when traversing certain directories when running on Windows
o Threads are not closed properly
o Fix failing tests when running in another TZ than CEST
o add_filesize option is broken
o Files are not removed if more than 10 to be removed
o FSCrawler is using local dirs instead of remote dirs
o Do not open remote connection for every single dir
o FS Crawler uses 100% CPU time
o Empty settings generate NPE
o Default REST elasticsearch port should be 9200 and not 9300
o Fix SSH tests
o Announcement email should be sent while on the release version
o Fixed not closed SSH connections

Changes:
o Clean if (logger.isDebugEnabled())
o Move mappings root directory to ~/.fscrawler/{job_name}/_mappings/
o Move _status.json file in job directory
o Run full integration test suite with Travis
o Update to Elasticsearch 5.0
o Get mappings from resource files instead of Java based mappings
o Upgrade Apache Tika 1.13
o Don't run integration tests if a cluster is already running on the same test port
o Add support for multiple nodes (only one supported now)

For a manual installation, you can download the fscrawler-2.1 here:
https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler/2.1/

Have fun!
-Elasticsearch File System Crawler team


(system) #2