How to calculate disk space needed for a cluster?

I would like to have a rough estimate on how much disk space is required for a cluster. I found a paper that includes the following equation for calculating disk space:

equation

I am trying to find any Elastic documentation on estimating disk space, but cannot find any. The paper did not mention if the equation was just made up or taken from some source.

Is this equation accurate?

It's not completely inaccurate but it is certainly an oversimplification. The 0.85 in the denominator indicates that the size of each shard on disk is about 117% (1/0.85) of the input size. In fact the ratio of input size to size on disk can vary greatly depending on your configuration. Here is an old blog post showing various configurations with ratios between 61% and 140%:

https://www.elastic.co/blog/elasticsearch-storage-the-true-story

As David points out this seems to be a simpification. I do however wonder if the 0.85 factor is meant to account for the fact that watermarks will prevent you from using the full disk capacity and that you need some headroom. This blog post also provides some information as does this one.

1 Like

Good point, it could be for the default low disk watermark indeed. In which case there's no accounting for compression/expansion at all.

In an update-heavy environment, be sure to take deleted docs into consideration as this could add another 50% to the index size.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.