Hi all, I need to estimate the storage and computing requirements required to deploy an Elasticsearch solution capable of handling a quite large document collection.
Here are some of the key requirements:
- the repository should index 4 billions of documents
- the estimated average document size is 15Kb/document in TXT format (approximately 60TB of data)
- each document will have a set of metadata associated (document date, document title, document category, document ID... approximately 2 other KB of data per each document)
- every day, 2 million of new documents should be added to the collection and hence be indexed, and a similar amount of documents should be deleted
- the average number of queries/day could be in the 30,000-50,000 range
In a similar scenario:
- What could be the overall storage requirements (indexes, temporary areas for indexing and caches, etc.)?
- What could be the recommended configuration for the servers dedicated to indexing processes (number of servers, recommended HW sizing, etc.)?
- What could be the recommended configuration for the servers dedicated to searching (number of servers, recommended HW sizing, etc.)?
Any suggestion will be appreciated.. thanks.