Is ElasticSearch meant for long term storage of large datasets?

I have users that all have hundreds of GB’s of documents (pdf, word, etc). I want to use Elastic with the ingest-attachment plugin to make the plaintext contents of these documents searchable. It’s unlikely these documents will eventually be deleted, so the dataset will mostly get bigger.

My question: is this a usecase that Elastic can safely handle? Or will I eventually run into a wall of sorts if the data set gets too big? The thing I’m mostly worried about is that searching for text will get slower if users upload too many documents.

1 Like

Hundreds of GBs is not really a "large" dataset in Elasticsearch terms; you can search TBs of data with a single node and there are clusters out there containing PBs of data that ingest hundreds of GBs of new documents every hour.

Moreover PDFs and Word documents typically contain a lot of unsearchable overhead, so the size of the plaintext content that Elasticsearch actually sees is often many times smaller than the total file size.

I'm not saying that there are no scaling limits, of course, and this all depends on your usage pattern and performance goals too, but in terms of data size alone you're well within the comfort zone.

1 Like

Excellent, thanks for the feedback.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.