I read this article (Benchmarking and sizing your Elasticsearch cluster for logs and metrics | Elastic Blog) It introduced how to calculate total data, total storage and the number of data nodes in my cluster
* Total Data (GB) = Raw data (GB) per day * Number of days retained * (Number of replicas + 1) * Indexing/Compression Factor * Total Storage (GB) = Total data (GB) * (1 + 0.15 disk Watermark threshold + 0.1 Margin of error) * Total Data Nodes = ROUNDUP(Total storage (GB) / Memory per data node / Memory:Data ratio) In case of large deployment it's safer to add a node for failover capacity.
and following code block is the simple example which is introduced in the article, too
* Total Data (GB) = 1GB x (9 x 30 days) x 2 = 540GB * Total Storage (GB)= 540GB x (1+0.15+0.1) = 675GB * Total Data Nodes = 675GB disk / 8GB RAM /30 ratio = 3 nodes
The Question is there is no part related to disk storage (in ec2 instance, EBS volume)
In that example, what if storage per node is only 100GB? According to the article 3 nodes are sufficient but 300GB is far less than total storage(675GB)
There is no need to consider storage per node as well as total storage?