Need advise on how to building a 4 billions documents index with Elasticsearch

TheOrbix · February 7, 2019, 10:39pm

Hi all, I need to estimate the storage and computing requirements required to deploy an Elasticsearch solution capable of handling a quite large document collection.

Here are some of the key requirements:

the repository should index 4 billions of documents
the estimated average document size is 15Kb/document in TXT format (approximately 60TB of data)
each document will have a set of metadata associated (document date, document title, document category, document ID... approximately 2 other KB of data per each document)
every day, 2 million of new documents should be added to the collection and hence be indexed, and a similar amount of documents should be deleted
the average number of queries/day could be in the 30,000-50,000 range

In a similar scenario:

What could be the overall storage requirements (indexes, temporary areas for indexing and caches, etc.)?
What could be the recommended configuration for the servers dedicated to indexing processes (number of servers, recommended HW sizing, etc.)?
What could be the recommended configuration for the servers dedicated to searching (number of servers, recommended HW sizing, etc.)?

Any suggestion will be appreciated.. thanks.

TheOrbix · February 11, 2019, 7:50am

Hello... bumping for interest.
Is there anyone with experience in indexing pretty large collections of documents that can provide me some suggestions?

Christian_Dahlqvist · February 11, 2019, 8:37am

The amount of space the data takes up can vary quite a lot depending on how you need to query it and what mappings you use. The types of queries and acceptable latencies can also have a significant impact. I would therefore recommend you run some benchmarks to find out. Have a look at this Elastic{ON} talk for some guidance.

system · March 11, 2019, 8:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Storage on elastic search Elasticsearch	14	699	January 23, 2023
Querying 5.4 Billion JSON Documents occupying 2 TB - is it possible in Elasticsearch? Elasticsearch	2	438	May 9, 2020
Elastic Performance Elasticsearch	6	764	October 1, 2017
30 billion unique documents (and counting) Elasticsearch	8	4972	July 6, 2017
Optimal number of documents in an index Elasticsearch	2	6557	March 21, 2018

Need advise on how to building a 4 billions documents index with Elasticsearch

Related topics