Add more data nodes or more RAM/disk space to existed nodes?

This is how our current cluster setup on VMWARE:
3 master nodes: 8G RAM each, plus
2 data nodes: 8G RAM/ 500GB disk space, each on different physical disk for redundancy.

Recently we just started collecting NetFlow data with ElastiFlow, the data amount about is 25GB per day, with one
replication that's about 50GB disk space per day. The data retention period will be at least five days.

To accommodate the increasing data flow and amount, we are considering two scaling options here:

  1. Add two more data nodes (8GB RAM, 500GB disk) to the cluster, and make sure each storage are on different physical disk. (Thanks for the shard allocation awareness).
    The cluster will be 3 master nodes(8GB RAM) plus 4 data nodes(8GB RAM, 500GB disk each).

  2. Add 8GB RAM, 500GB to each two existing data nodes.
    The cluster will be 3 master nodes(8GB RAM) plus 2 data nodes(16GB RAM, 1TB disk each).

What are the pros and cons for each option? Any hints?

Thanks.

If we make the assumption that CPU resources also increase the same amount in both cases I would not necessarily expect much difference in performance. If anything I would expect 2 larger nodes to potentially perform better as there is less additional network traffic.

The main difference is probably what happens in failure scenarios. If you have 2 nodes and 1 replica, both nodes will hold all data. If a node fails Elasticsearch have nowhere to relocate the missing shards to and you will be running with just primary shards until the node comes back.

If you have 4 nodes Elasticsearch can and will recover missing shards on the remaining nodes should a node fail. Exactly how these recovered shards are spread out across the nodes depend on whether shard allocation awareness is used or not. In this case you will still have replica shards of at least some of your shards which increases reliability, but Elasticsearch will also try to put all shards on just 3 disks in this scenario which could cause problems if you are using most of your disk space.

To your case i suggest that:

increase both ram and disk for each node

Because of the es as tsdb(if i understood, ur use case?) use case will suffer:

    1. High on heap mem usage for terms、fielddata store.
    1. High off heap mem usage for doc values pre load.
    1. High io util when quering and fetching data points.
    1. High disk space usage for poor compression rate

but will not or hardly suffer the cpu bottleneck.

So, before adding nodes, increase the memory and disk space for each nodes in ur case until meeting the cpu bottleneck.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.