How much disk space to allocate on master node before creating replicas?

Hi,

I want to know how much disk space I should allocate for my master node.

I have a primary shard of size x and I want n replicas.

Are replicas created on the master node fully and then copied to a sister node? If yes, then I will need at least (n+1)x space on my master node, so that all replicas are created and sent to the sister nodes.

But, if the replicas are created slowly on each sister node, then that will require me to allocate much lesser space on my master node, which will be great!

In the same vein, if my primary shard becomes huge and I want to reindex it to break it up into smaller shards, will I need the same amount of disk space on the master node itself as the original primary shard to create more shards for the new index?

Please give me some suggestions/pointers.
Thank you in advance.

Are replicas created on the master node fully and then copied to a sister node?

The short answer is no. The master node (in elasticsearch's definition of master) is not involved in the actual replication process. It only processes the command to add more replicas, assigns the newly created replicas to other nodes in the cluster and publishes the new cluster state that contains the new number of shards and their allocations. When other nodes receive the new cluster state they create replica shards according to the cluster state and start replication process from the primary shards.

If by "master node" you mean the node where your primaries are currently allocated, then there will be a slight impact on the nodes where primaries are located if you will continue to actively index while creating replicas. When replication starts, all newly indexed records are kept in the transaction log, so if you are actively indexing during this procedure, you should expect slightly higher disk usage on the nodes with primary shards during replication.

In the same vein, if my primary shard becomes huge and I want to reindex it to break it up into smaller shards, will I need the same amount of disk space on the master node itself as the original primary shard to create more shards for the new index?

Again, the master node has nothing to do with indexing. You can actually create a master node that will not have any data at all, and this is actually the way we recommend running production elasticsearch clusters in this configuration. See https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html for more information. Saying this, for the reindexing purposes, you will to have an additional space. Basically, during reindexing you will have to have enough space to allocate at least primaries for the new index, assuming that you will reindex into an index without replicas, remove replicas for the old index and then create replicas for the new one.

So, does this mean that if I have n primary shards of size x each on different data nodes, then at least 1 data node should have nx free disk space for re-indexing to be successful?

Unless you are messing with allocation settings, elasticsearch will try to allocate shards evenly across all your data nodes. So, it's better to say that you should have at least nx free disk space for across all your data nodes combined.

I also need to mention that it's really bad idea to run elasticsearch very close to running out of disk space. Running out of disk space creates all sorts of problems, and disk usage by shards fluctuates constantly due to segment merging, so it's very easy to underestimate the needed disk space.

I have much clarity now. Thank you for the quick answer, Igor.