What's the disk watermark actually applied while specified in both node configuration file and cluster settings?

Hello, I have a cluster of several nodes, where each node has the following configuration about the disk watermark:

cluster.routing.allocation.disk.watermark.low=60gb
cluster.routing.allocation.disk.watermark.high=50gb
cluster.routing.allocation.disk.watermark.flood_stage=30gb

If I change the cluster watermark in Kibana Dev Tools using the following command, what will be the finally applied watermark values?

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "100gb",
    "cluster.routing.allocation.disk.watermark.high": "50gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb"
  }
}

Besides, what would happen if I set different watermark values in each node's configuration file?

Thanks for you help!

Welcome to our community! :smiley:

It will be the ones you posted, as they are cluster level.

1 Like

Thanks! So if I don't post cluster settings, how does the cluster determine the watermark? I think it should follow the configuration file for each node.

For example, In a cluster, here's the configuration for each node:

cluster.routing.allocation.disk.watermark.low=60gb
cluster.routing.allocation.disk.watermark.high=50gb
cluster.routing.allocation.disk.watermark.flood_stage=30gb

Since each node has its own configuration file, one immediate question and curiosity is:
What would happen if I set different watermark settings for each node in the configuration? (When I don't post any cluster settings on that)

I haven't tried it myself, but I don't find any documentation about this in Elasticsearch Guide. Thanks again.

Shard allocation decisions, including those involving disk watermarks, are taken on the elected master, so it's the settings on the master that matter.

2 Likes

first,the priority of settings method is as follows:
transient -> persistent -> yml
restart the cluster transient settings will invalid,
transient and persistent settings are all Cluster level。

second,If you configure different nodes with different yml settings,each node executes its own policy,

Thanks! I'm not sure I understand correctly what means "Cluster level" here?

In my understanding, when I have set persistent watermarks, the watermarks in yaml configuration of each node will be ignored, is that true?

yes, this is the conf priority : transient -> persistent -> yml

Perhaps worth linking to the docs about this since they have some more detail:

One more thing? About this:

Another friend's reply says differently, where in this case "each node will follow master node's settings".

Some settings take effect at the node level so you can set them differently on each node. But the disk watermarks aren't like that because they affect shard allocation decisions which are only computed on the master, so it's the master's setting which matters.

Thanks for pointing out the reference and the clear answer, David!

So, may I ask one more question here?

Background:
Let's say we have disk flood stage watermark set to "30 GB" — then if one node has reaches this watermark (has less than 30 GB disk available), the indices which have shards in the node will be marked as read_only_allow_delete (is that right?). And the documentation says "When disk usage on the affected node drops below the high watermark, Elasticsearch automatically removes the write block.".

My Issue:
Now, recently I've encountered a problem where I see some indices have been automatically set this read_only_allow_delete flag. I checked the cluster allocation, and the available disk space is more than 50 GB, which is far greater than the 30 GB flood stage, and in this case, the 50 GB high watermark. But still, the flag isn't automatically removed from the indices. So I had to remove the flags myself.
Do you have thoughts on this problem?

Thanks!

Elasticsearch emits logs that describe the decisions it's making in relation to disk watermarks. You'll need to look at them, or share them here (in full) if you would like help interpreting them.

Thank you, David.
Yeah, I checked the logs last time.

Actually the cluster I'm talking about is deployed as docker containers.
When I checked the logs last time, I used the following command on the node server where there was least available disk space of all nodes in the cluster (because I thought that was the problem node!):

docker container logs es_container_id --since '2022-02-22'

But without any valuable content about the "read_only_allow_delete" or about "watermark"!

Now that you mentioned the logs, I am wondering:
When I want to see Elasticsearch logs about this read_only_allow_delete issue (why and when Elasticsearch adds the flag), which node server in the cluster should I log in to find the logs? The master node, or the one with least disk space available? Or are there other better ways to view the logs to debug this problem?

I forget, but it might be the master. I'd normally just look at them all.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.