hi, we are using ElasticSearch 5.6 and NEST, we are also using the ingest attachment plugin to do full-text content search of various documents like pdf, word etc.
For large files, potentially above 100 MB in size, how do we implement streaming so that we dont load the whole file in memory?
can we use chunking in conjunction with the ingest processor?
thanks
There is sadly nothing like that. You need to provide a full JSon document to elasticsearch.
No streaming.
You can give a look at FSCrawler project which exposes a REST endpoint where you can directly upload a binary document. The good thing is that is running in an external process (not within elasticsearch node) so it will put much less pressure on your nodes.
Warning: it’s a community tool, not maintained by elastic.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.