use fscrawler to crawl my directory, use the setting fs:store_source (Base64-encoded document) to add the file as a field to the message, send that message to a pipeline on ES that has the ingest-attachment plugin. If it's a static set of files, then I may not even need fscrawler.... just some simple code to crawl the directory and encode the files before sending them to ES.
use fscrawler to do something... but I'm not sure what it is. It's hard for me to understand it. Does fscrawler somehow replace the ingest-attachment plugin? The docs seem to suggest that it doesn't need ingest-attachment plugin.
Thank you David!
I am wondering then which would be the better approach for my use case, where I have thousands of binary files to index, many many gigabytes. My current fscrawler settings file uses store_source, and then sends each file to an ES pipeline. The ES pipeline uses the ingest-attachment plugin. I don't use the REST server.
Would it be better to use fscrawler to extract the content from the binary docs, then send that content to a bare (no ingest-attachment plugin) ES index? And should that use the REST server, or some other method?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.