For my project, we are storing the data in Elasticsearch.
I have new requirement regarding documents (i.e. the actual binaries):
Store documents and be able to search the text within the document binary.
Allow the user to download the document after I display the search results.
I came across few discussions mentioning Elasticsearch is not designed to store big BLOBs.
Need suggestion if I should store data in Elasticsearch or a file system.
If you decode the blob's then index them Elasticsearch can search in them. Otherwise it's not going to be worth it. You would then store the metadata in there to search on that.
Storing large binary objects in Elasticsearch is not recommended. Instead store the extracted and indexed text together with a location of the binary object, e.g. on S3, so you can retrieve it from there instead when needed.
This requirement is to build up a knowledge exchange site with images, word, pdf, etc. Number of files will increase over time. Upto 10MB is expected file size as of now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.