hi, we are using ElasticSearch 5.6 and NEST, we are also using the ingest attachment plugin to do full-text content search of various documents like pdf, word etc.
For large files, potentially above 100 MB in size, how do we implement streaming so that we dont load the whole file in memory?
can we use chunking in conjunction with the ingest processor?
There is sadly nothing like that. You need to provide a full JSon document to elasticsearch.
You can give a look at FSCrawler project which exposes a REST endpoint where you can directly upload a binary document. The good thing is that is running in an external process (not within elasticsearch node) so it will put much less pressure on your nodes.
Warning: it’s a community tool, not maintained by elastic.
thanks for the reply dadoonet.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.