What is possible hardware configurations for 80 GB Per day data volume?

Vivek_Vardhan · July 4, 2019, 10:59am

I am setting up a cluster on ES version 6.2.3 and i have following scenario:

Data Volume 80GB per day (with 1 month of retention )
Search scenarios (dashboards only few aggregation queries, max 50 users a day)

Current Configuration -

3 master eligible nodes (8 GB RAM , 2 Core CPU, 100 GB Hard Disk each each)
10 Data Nodes (8 GB RAM , 2 Core CPU, 512 GB (total to retain data for 1 month) )

There are two possible scenarios

Data nodes also act as node with HTTP enable or
Take separate client nodes with HTTP enable, and NODE_DATA as false

Other settings -

The Java process XMX is set to 3500m (as 50% was recommended).
Shards per indexes = 5 (default)
Replication per shard = 2
Master nodes are NOT data node
refresh interval for indices set to be 30s
For each day, data will be stored in separate index.

Is this config set up good for my scenario ?

Problems we faced while testing -

Encountered Http code 502 from http APIs while load testing with this config
Sometimes nodes are going down
Few of the shards coming as UNASSIGNED

What can be the reasons for these issues? What should we monitor or change to do in config?

Bernt_Rostad · July 4, 2019, 11:25am

Your 80 GB data per day may not be the same amount of data Elasticsearch ends up saving to disk, it depends on your mapping, sharding and other factors. So the first thing I would do is to run a simulation, indexing a full day's worth of data with the mapping and index settings you intend to use in production. That will give you a better grasp of the amount of disk you need.

10 data nodes with 512 GB disk gives you roughly 5 TB of disk space for data in the cluster, which doesn't sound enough for the use case you've listed above. Consider this:

If you're going to use 2 replicas per primary shard that means you need to triple the disk space from what you store in the primary shards.

As an example, let us say you actually save 80 GB of primary data to disk every day, then you also save 2 x 80 = 160 GB of replica data per day or a total of 240 GB to disk per day. And with 30 days per month that ends up at 240 GB x 30 = 7200 GB which is about 2 TB more than what you have available in a cluster with 10 data nodes. This clearly won't work.

Ideally you should never use up more than 70-80% of the disk space because that leaves you with no room for merging big shard segments or for re-indexing when you need to change a mapping. So if you aim for 7200 GB of data per month I would recommend a cluster with total disk space of at least 8000 GB as that would give you 800 GB or 10% free disk space when the cluster has stored one month of data. In that case you'll need 8000 / 512 = 16 data nodes of 512 GB.

Alternatively, if you reduce the replica factor to just 1 you'll need a lot less disk space (just 4800 GB for 30 days).

dadoonet · July 4, 2019, 11:54am

Adding to @Bernt_Rostad's great answer some resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

Vivek_Vardhan · July 4, 2019, 12:55pm

@Bernt_Rostad Thanks for this explanation. Actually data size is 80GB per day including replication. My bad, i did not mention that. So, we are allocating 50% more disk size.

Vivek_Vardhan · July 4, 2019, 12:56pm

@dadoonet Thanks a lot for these valuable resources.

Bernt_Rostad · July 4, 2019, 1:14pm

Excellent, then you should be in good shape regarding the disk space

system · August 1, 2019, 1:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What Hardware do I need for 100 GB Per day data volume? Elasticsearch	4	2470	August 2, 2018
Elasticsearch Cluster for distributed mode Elasticsearch	4	1093	July 5, 2017
Is this sufficcent hardware to handle 300 to 600 GB daily data volume? Elasticsearch	1	794	October 8, 2018
Capacity planning for 200GB data /day and retention period of 30 days Elasticsearch	1	569	April 6, 2017
How much Disk I need To store 100GB/day with 1 Replica for 1 year? Elasticsearch	4	977	February 12, 2019

What is possible hardware configurations for 80 GB Per day data volume?

Related topics