ElasticSearch as primary DB for document library

Gresmir · June 22, 2022, 8:54am

My task is a full-text search system for a really large amount of documents (tens of millions). Now I have documents as RTF file and their metadata, so all this will be indexed in Elasticsearch. These documents are unchangeable (they can be only deleted). I don't really expect many new documents per day and I choose the time these documents are inserted. So is it a good idea to use elastic as primary DB in this case?

Maybe I'll store the RTF file separately, but I really don't see the point of storing all this data somewhere else.

whatgeorgemade · June 22, 2022, 9:30am

Elasticsearch should be fine for this use-case. Just be sure to keep snapshots. Keeping the original documents is a good idea, so you're able to completely rebuild the indices if the worst happens, or you hit bugs in ingest and need to start from scratch.

Depending on how you define 'really large amount', Elasticsearch could be overkill. If you don't need to distribute the data over multiple nodes and have some Java expertise, using vanilla Lucene is worth thinking about.

Gresmir · June 22, 2022, 12:05pm

I have a couple of million documents, according to approximate calculations in RTF files, it will be near a petabyte.

system · July 20, 2022, 12:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic-search as our primary database Elasticsearch	3	367	July 6, 2017
Can i elastic search as my primary store? Elasticsearch	4	391	July 6, 2017
Elastic Search as a primary data store Elasticsearch	6	420	July 6, 2017
ES DataBase Engine Elasticsearch	18	3210	July 6, 2017
Should I consider Elastic as primary storage for data I want to search through? Elasticsearch	6	1885	July 5, 2017

ElasticSearch as primary DB for document library

Related topics