Don't return whole BASE64 encoded files (ingest plugin)

Hi,

I've been reading a couple about how the ingest plugin works. So far this works correctly for me when searching for words inside the PDF files, however, I noticed a problem with large files.

Let's say I search for a PDF which originally weights 20mb, if I search for it, it returns the 20mb+ encoded BASE64 data. Unfortunately that becomes a huge problem, I would like ElasticSearch to not do that.

Is there a way to accomplish this?

Thanks!

Use the remove processor to remove the BASE64 field.

Or have a look at FSCrawler project which does not send to elasticsearch at all the binary file but only the extracted content.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.