What is possible hardware configurations for 80 GB Per day data volume?

I am setting up a cluster on ES version 6.2.3 and i have following scenario:

  • Data Volume 80GB per day (with 1 month of retention )
  • Search scenarios (dashboards only few aggregation queries, max 50 users a day)

Current Configuration -

  • 3 master eligible nodes (8 GB RAM , 2 Core CPU, 100 GB Hard Disk each each)
  • 10 Data Nodes (8 GB RAM , 2 Core CPU, 512 GB (total to retain data for 1 month) )

There are two possible scenarios

  1. Data nodes also act as node with HTTP enable or
  2. Take separate client nodes with HTTP enable, and NODE_DATA as false

Other settings -

  • The Java process XMX is set to 3500m (as 50% was recommended).
  • Shards per indexes = 5 (default)
  • Replication per shard = 2
  • Master nodes are NOT data node
  • refresh interval for indices set to be 30s
  • For each day, data will be stored in separate index.

Is this config set up good for my scenario ?

Problems we faced while testing -

  • Encountered Http code 502 from http APIs while load testing with this config
  • Sometimes nodes are going down
  • Few of the shards coming as UNASSIGNED

What can be the reasons for these issues? What should we monitor or change to do in config?

Your 80 GB data per day may not be the same amount of data Elasticsearch ends up saving to disk, it depends on your mapping, sharding and other factors. So the first thing I would do is to run a simulation, indexing a full day's worth of data with the mapping and index settings you intend to use in production. That will give you a better grasp of the amount of disk you need.

10 data nodes with 512 GB disk gives you roughly 5 TB of disk space for data in the cluster, which doesn't sound enough for the use case you've listed above. Consider this:

If you're going to use 2 replicas per primary shard that means you need to triple the disk space from what you store in the primary shards.

As an example, let us say you actually save 80 GB of primary data to disk every day, then you also save 2 x 80 = 160 GB of replica data per day or a total of 240 GB to disk per day. And with 30 days per month that ends up at 240 GB x 30 = 7200 GB which is about 2 TB more than what you have available in a cluster with 10 data nodes. This clearly won't work.

Ideally you should never use up more than 70-80% of the disk space because that leaves you with no room for merging big shard segments or for re-indexing when you need to change a mapping. So if you aim for 7200 GB of data per month I would recommend a cluster with total disk space of at least 8000 GB as that would give you 800 GB or 10% free disk space when the cluster has stored one month of data. In that case you'll need 8000 / 512 = 16 data nodes of 512 GB.

Alternatively, if you reduce the replica factor to just 1 you'll need a lot less disk space (just 4800 GB for 30 days).

2 Likes

Adding to @Bernt_Rostad's great answer some resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

1 Like

@Bernt_Rostad Thanks for this explanation. Actually data size is 80GB per day including replication. My bad, i did not mention that. So, we are allocating 50% more disk size.

@dadoonet Thanks a lot for these valuable resources.

Excellent, then you should be in good shape regarding the disk space :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.