We are trying to implement single node cluster of Elasticsearch per installation of our application. Our application is on-prem. I was looking at the indexing data location parameters in the Elastic documentation. I came to know there are disk-based shard allocator settings with following parameters,
cluster.routing.allocation.disk.threshold_enabled
cluster.routing.allocation.disk.watermark.low
cluster.routing.allocation.disk.watermark.high
I would need to know whether by default Elasticsearch sets the disk threshold as Enabled?
I don’t see the use case of keeping it enable by default in a single node cluster where we will be having multiple disks on a single node/server. We will write our own disk alerts to monitor the threshold.
It would be good if anyone can explain the low and high alert meaning with some simple examples?
Could you please let me know, how the high watermark is helpful in a single-node cluster setup considering "one node one disk" as it works for shard relocation across nodes.
In a single node cluster, I believe disk_low_watermark and flood_stage will play an important role?
For a single data node you need to worry only about the flood_stage configuration, the low and high watermark will have no impact.
The low watermark does not impact the primary shards of new indices, as explained in the reference manual.
This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.
Since you have a single-node, you do not have any replicas.
You also do not have other nodes, so the high watermark will also has no impact as it has nowhere to move shards.
The flood_stage is the one that will impact you, since it will block the writes in your node.
As David said before, the low and high will be useful just for logging, you should monitor the logs of your nodes and use this information to decide when to action and free disk space.
Elsewhere you are asking about a design in which you have multiple disks, and therefore multiple nodes, all running on a single host, so I think you don't have a single-node cluster as you claim here. In that case the watermarks all work as documented to move shards around to ensure that no node's disk gets too full.
Thanks David! We are planning to integrate Elasticsearch with Enterprise application. We are heavily dependent on indexing. Our Application has use cases of scaling storage vertically as well as horizontally. Existing customers has stored around 10 TB of indexing data on a single sever with older indexing engine. Now, we are seeing challenge fulfilling the vertical scaling use case with Elasticsearch horizontally scale model. Like, one node one path, we won't be able keep existing functionality of adding drives and keep scaling the server vertically.
Anyways, you guys are too prompt in answering and clarifying my doubts!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.