Add more data nodes or more RAM/disk space to existed nodes?

yiy · November 7, 2019, 12:41am

This is how our current cluster setup on VMWARE:
3 master nodes: 8G RAM each, plus
2 data nodes: 8G RAM/ 500GB disk space, each on different physical disk for redundancy.

Recently we just started collecting NetFlow data with ElastiFlow, the data amount about is 25GB per day, with one
replication that's about 50GB disk space per day. The data retention period will be at least five days.

To accommodate the increasing data flow and amount, we are considering two scaling options here:

Add two more data nodes (8GB RAM, 500GB disk) to the cluster, and make sure each storage are on different physical disk. (Thanks for the shard allocation awareness).
The cluster will be 3 master nodes(8GB RAM) plus 4 data nodes(8GB RAM, 500GB disk each).
Add 8GB RAM, 500GB to each two existing data nodes.
The cluster will be 3 master nodes(8GB RAM) plus 2 data nodes(16GB RAM, 1TB disk each).

What are the pros and cons for each option? Any hints?

Thanks.

Christian_Dahlqvist · November 7, 2019, 6:34am

If we make the assumption that CPU resources also increase the same amount in both cases I would not necessarily expect much difference in performance. If anything I would expect 2 larger nodes to potentially perform better as there is less additional network traffic.

The main difference is probably what happens in failure scenarios. If you have 2 nodes and 1 replica, both nodes will hold all data. If a node fails Elasticsearch have nowhere to relocate the missing shards to and you will be running with just primary shards until the node comes back.

If you have 4 nodes Elasticsearch can and will recover missing shards on the remaining nodes should a node fail. Exactly how these recovered shards are spread out across the nodes depend on whether shard allocation awareness is used or not. In this case you will still have replica shards of at least some of your shards which increases reliability, but Elasticsearch will also try to put all shards on just 3 disks in this scenario which could cause problems if you are using most of your disk space.

DeeeFOX · November 7, 2019, 7:09am

To your case i suggest that:

increase both ram and disk for each node

Because of the es as tsdb(if i understood, ur use case?) use case will suffer:

1. High on heap mem usage for terms、fielddata store.
1. High off heap mem usage for doc values pre load.
1. High io util when quering and fetching data points.
1. High disk space usage for poor compression rate

but will not or hardly suffer the cpu bottleneck.

So, before adding nodes, increase the memory and disk space for each nodes in ur case until meeting the cpu bottleneck.

system · December 5, 2019, 7:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding new data node to 3 node elasticsearch cluster Elasticsearch	2	416	September 9, 2020
How much disk space should be in available to get a better elasticsearch cluster performance? Elasticsearch	4	3039	August 7, 2018
Three bigger data nodes or six smaller? Elasticsearch	7	2424	April 24, 2017
Disk Size and RAM size for each data node and master node Elasticsearch	5	3033	August 17, 2018
Performance for 5 node cluster Elasticsearch	8	1388	July 5, 2017

Add more data nodes or more RAM/disk space to existed nodes?

Related topics