I am trying to use fscrawler, would like to know if the following use case can be achieved using fscrawler.
I already have an index / type created with my custom mappings set
i am using the ingest-attachment plugin along with the ingest processor
Question : Can i use fscrawler to index pdf files into specified index/type/doc & specific field using the configurations / rest api ?
reason for doing this: i have very large documents which i would like to index, & the application i am using is on a windows ecosystem (using NEST client), getting a base64string out of large documents is giving me memory issues, so as an alternative would like to check if fscrawler can pull the documents directly & index them in specified index/type & against specific documentid (in specific field defined for attachment type)
Can i use fscrawler to index pdf files into specified index/type/doc & specific field using the configurations / rest api ?
Yes. But FSCrawler gives less flexibility than ingest processors about field names. But the good news is that you can process a file with FSCrawler which will send it to elasticsearch through an ingest pipeline where you can simply rename a field to the desired target field you wish. See GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)
I tried the above suggestion, but somehow the .bat doesn't do anything i.e. fscrawler doesn't start up at all I am using windows 7 & have an ES node running on my local system, following error comes up on the console if i attempt to terminate the batch job
D:\DevStuff\FsCrawler\bin>fscrawler --config_dir "D:\DevStuff\FsCrawler\attachtest_attach" attachtest_attach --loop 0 --rest
Exception in thread "main" java.util.NoSuchElementException
at java.util.Scanner.throwFor(Scanner.java:862)
at java.util.Scanner.next(Scanner.java:1371)
at fr.pilato.elasticsearch.crawler.fs.FsCrawler.main(FsCrawler.java:212)
Terminate batch job (Y/N)? y
verified java version as follows
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.