Disk space cosideration for elasticsearch in production

krishna_chaitanya · January 25, 2017, 9:57pm

I was going through 2 good posts to determine the size of ES servers in production. Here and here

Do we need to account for extra size (free) on disk ? So, if all of my indices occupy, say 3 TB (with replication) and I want to start 3 nodes with 1 TB size on each node? Will that work?

Or do I need to keep 50% space unused for any internal elasticsearch operations? i.e, 1.5 TB on each node?

Christian_Dahlqvist · January 25, 2017, 10:05pm

As Elasticsearch writes to immutable segments during indexing, which then are merged, you will need some free disk space to account for this. Exactly how much will depend on the workload. You should also consider what would happen if you lost one node. In this case, assuming that you have a replication factor of 1, Elasticsearch will want to allocate the shards on the missing node to the other nodes in the cluster, which in your case could result in about 1.5TB of data per node. If there is not enough room, replica shards will remain unassigned, which may be perfectly acceptable for your use case.

krishna_chaitanya · January 25, 2017, 10:34pm

So, if I have 3 X 1.5 TB nodes, which have 500GB free disk space for these merge operations and one node failure considerations.

If 1 node goes down in above cluster, 1 TB of index data needs to be distributed to 2 other nodes to ensure replication (1 replica). So, the free 500GB on each of these nodes is filled up, and they dont have sufficient space of merge operations until a new node is added. Am I right ?

So, it is better to keep, say 50% of free disk space, in this case? This makes each node 2 TB.
My load is something close to 10 GB/day (without replication)

Christian_Dahlqvist · January 26, 2017, 5:35am

Having more disk space will also gives you room to grow, but regarding the relocation of shards on node failure it comes down to how long you need to handle a node being down and whether it is acceptable or not to not have all replica shards assigned during this time.

krishna_chaitanya · January 26, 2017, 3:28pm

Thanks for the info. I looked into documentation and found this. Much helpful.

krishna_chaitanya · January 26, 2017, 7:26pm

I have one more question, and this is about how master election is done if I have 3 nodes(2master+data and 1 dedicated master).

I have gone through this post, where you have given some details about using such configuration(3 nodes startup) for small clusters. With 10GB/day and holding 3 TB of index data, I believe my cluster is also a small one. I have also gone through master-election documentation here.

Since I don't want my data nodes to be doing extra work, I would like to have my only dedicated master node to be the actual master. How can I make sure that happens during Master-Election? Or is this the default behavior?

system · February 23, 2017, 7:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How much disk space to allocate on master node before creating replicas? Elasticsearch	5	4880	July 5, 2017
Is spaced considered when allocating shards? Elasticsearch	3	283	July 6, 2017
How much disk space should be in available to get a better elasticsearch cluster performance? Elasticsearch	4	3056	August 7, 2018
Non-Uniform Drive Space Across Nodes Elasticsearch	6	1613	July 6, 2017
Document distribution in a cluster Elasticsearch	5	478	July 6, 2017

Disk space cosideration for elasticsearch in production

Related topics