Is ElasticSearch meant for long term storage of large datasets?

Basman · February 22, 2020, 9:36am

I have users that all have hundreds of GB’s of documents (pdf, word, etc). I want to use Elastic with the ingest-attachment plugin to make the plaintext contents of these documents searchable. It’s unlikely these documents will eventually be deleted, so the dataset will mostly get bigger.

My question: is this a usecase that Elastic can safely handle? Or will I eventually run into a wall of sorts if the data set gets too big? The thing I’m mostly worried about is that searching for text will get slower if users upload too many documents.

DavidTurner · February 22, 2020, 12:16pm

Hundreds of GBs is not really a "large" dataset in Elasticsearch terms; you can search TBs of data with a single node and there are clusters out there containing PBs of data that ingest hundreds of GBs of new documents every hour.

Moreover PDFs and Word documents typically contain a lot of unsearchable overhead, so the size of the plaintext content that Elasticsearch actually sees is often many times smaller than the total file size.

I'm not saying that there are no scaling limits, of course, and this all depends on your usage pattern and performance goals too, but in terms of data size alone you're well within the comfort zone.

Basman · February 22, 2020, 1:34pm

Excellent, thanks for the feedback.

system · March 21, 2020, 1:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch ingest large of data Elasticsearch	6	1103	January 26, 2023
Elasticsearch considerations for ingesting large files Elasticsearch	7	2292	May 9, 2020
Server requirement Elasticsearch	4	832	July 5, 2017
Elasticsearch for small datasets? Elasticsearch	5	1927	September 23, 2020
How Attachments or file storage and searching is handled in Elasticsearch Elasticsearch	7	1439	August 13, 2020

Is ElasticSearch meant for long term storage of large datasets?

Related topics