Store binary files in elastic search

For my project, we are storing the data in Elasticsearch.
I have new requirement regarding documents (i.e. the actual binaries):

  1. Store documents and be able to search the text within the document binary.
  2. Allow the user to download the document after I display the search results.

I came across few discussions mentioning Elasticsearch is not designed to store big BLOBs.
Need suggestion if I should store data in Elasticsearch or a file system.

Welcome to our community! :smiley:

If you decode the blob's then index them Elasticsearch can search in them. Otherwise it's not going to be worth it. You would then store the metadata in there to search on that.

1 Like

Take a look at FSCrawler project as well. It could help you.

1 Like

Thank you for the quick response.

Can we possibly run into performance issues later due to big file sizes?

How big?

Storing large binary objects in Elasticsearch is not recommended. Instead store the extracted and indexed text together with a location of the binary object, e.g. on S3, so you can retrieve it from there instead when needed.

1 Like

This requirement is to build up a knowledge exchange site with images, word, pdf, etc. Number of files will increase over time. Upto 10MB is expected file size as of now.

Maybe you should look at Elastic Workplace Search | Elastic then.

My application is on .Net Core.
I am also going through this blog for integration using Nest - The Future of Attachments for Elasticsearch and .NET | Elastic Blog.
This is based on usage of "ingest attachment processor plugin".

Could you please suggest if this approach will scale with the increase in number of attachments.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.