Disk threshold

We are trying to implement single node cluster of Elasticsearch per installation of our application. Our application is on-prem. I was looking at the indexing data location parameters in the Elastic documentation. I came to know there are disk-based shard allocator settings with following parameters,

  1. cluster.routing.allocation.disk.threshold_enabled
  2. cluster.routing.allocation.disk.watermark.low
  3. cluster.routing.allocation.disk.watermark.high

I would need to know whether by default Elasticsearch sets the disk threshold as Enabled?

I don’t see the use case of keeping it enable by default in a single node cluster where we will be having multiple disks on a single node/server. We will write our own disk alerts to monitor the threshold.

It would be good if anyone can explain the low and high alert meaning with some simple examples?

I think the reference manual answers most of these questions so I suggest starting there.

The low and high watermarks are only really useful for logging in a single-node setup, but the flood_stage watermark is still valuable there.

1 Like

Hi @DavidTurner

Could you please let me know, how the high watermark is helpful in a single-node cluster setup considering "one node one disk" as it works for shard relocation across nodes.

In a single node cluster, I believe disk_low_watermark and flood_stage will play an important role?

Kindly confirm!

For a single data node you need to worry only about the flood_stage configuration, the low and high watermark will have no impact.

The low watermark does not impact the primary shards of new indices, as explained in the reference manual.

This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.

Since you have a single-node, you do not have any replicas.

You also do not have other nodes, so the high watermark will also has no impact as it has nowhere to move shards.

The flood_stage is the one that will impact you, since it will block the writes in your node.

As David said before, the low and high will be useful just for logging, you should monitor the logs of your nodes and use this information to decide when to action and free disk space.

Thank you! This information is helpful.

Elsewhere you are asking about a design in which you have multiple disks, and therefore multiple nodes, all running on a single host, so I think you don't have a single-node cluster as you claim here. In that case the watermarks all work as documented to move shards around to ensure that no node's disk gets too full.

1 Like

Thanks David! We are planning to integrate Elasticsearch with Enterprise application. We are heavily dependent on indexing. Our Application has use cases of scaling storage vertically as well as horizontally. Existing customers has stored around 10 TB of indexing data on a single sever with older indexing engine. Now, we are seeing challenge fulfilling the vertical scaling use case with Elasticsearch horizontally scale model. Like, one node one path, we won't be able keep existing functionality of adding drives and keep scaling the server vertically.

Anyways, you guys are too prompt in answering and clarifying my doubts!