Low disk watermark [15%] exceeded on

sunilmchaudhari · October 27, 2015, 9:03am

Hi,
I have 3 nodes cluster in our environment. Each node has data.path where 70GB space is available.
Still ES is showing "low disk watermark [15%] exceeded on".
Can anybody explain me why its that?

br,
Sunil Chaudhari.

dadoonet · October 27, 2015, 9:19am

You have less than 15% of the total disk space remaining free.

You can change this settings to an absolute value or change the percentage.

Look at https://www.elastic.co/guide/en/elasticsearch/reference/current/disk.html

sunilmchaudhari · October 27, 2015, 9:53am

Hi,
but look at below: elasticsearch is the partition where data files are located. and its 20% used only.

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
                      9.6G  3.7G  5.5G  41% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/mapper/vg_main-lv_backup
                      4.8G  9.9M  4.5G   1% /backup
/dev/sda1             488M   61M  403M  14% /boot
/dev/mapper/vg_main-lv_home
                      7.6G  733M  6.5G  10% /home
/dev/mapper/vg_main-lv_log
                      9.6G   23M  9.1G   1% /log
/dev/mapper/vg_main-lv_tmp
                      4.8G   11M  4.5G   1% /tmp
/dev/mapper/vg_main-lv_var
                      4.8G  346M  4.2G   8% /var
/dev/mapper/vg_main-lv_varlog
                      4.8G  142M  4.4G   4% /var/log
/dev/mapper/vg_main-lv_varlogaudit
                      4.8G   36M  4.5G   1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
                       79G   15G   61G  20% /elasticsearch

dadoonet · October 27, 2015, 10:57am

Interesting. @dakrone do you have an idea?

dakrone · October 27, 2015, 9:41pm

Can you enable TRACE logging for the cluster package on the master node for a little bit? It will log all of the collected disk stats about each of the nodes.

You should be able to with:

PUT /_cluster/settings
{
  "transient": {
    "logger.cluster": "TRACE"
  }
}

dakrone · October 27, 2015, 10:01pm

Also, can you collect the output of df -h on all of the data nodes so I can correlate the reported vs actual disk?

warkolm · October 28, 2015, 1:02am

Also, what version are you on?

sunilmchaudhari · October 28, 2015, 5:00am

Hi @dakrone,
Do I need to restart ES after enabling TRACE log via PUT command?

warkolm · October 28, 2015, 5:34am

You do not.

sunilmchaudhari · October 28, 2015, 6:58am

Hi, @warkolm, @dakrone ,
below is consolidated information from my cluster.
ES version 1.5.2
3 Nodes on multiple hosts given below.

"sit-0" master-true data-true --> index.number of shards 3 and replicas -1
"sit-1" master- false data-true --> index.number of shards 3 and replicas -1
"sit-2" master- false data-true --> index.number of shards 3 and replicas -1

#df -h on sit-0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
9.6G 3.7G 5.5G 41% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 403M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 733M 6.5G 10% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 11M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 346M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 623M 3.9G 14% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 36M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 15G 61G 20% /elasticsearch

df -h on sit-1

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
                       99G   32G   62G  34% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/mapper/vg_main-lv_backup
                      4.8G  9.9M  4.5G   1% /backup
/dev/sda1             488M   61M  402M  14% /boot
/dev/mapper/vg_main-lv_home
                      7.6G  488M  6.8G   7% /home
/dev/mapper/vg_main-lv_log
                      9.6G   23M  9.1G   1% /log
/dev/mapper/vg_main-lv_tmp
                      4.8G  9.9M  4.5G   1% /tmp
/dev/mapper/vg_main-lv_var
                      4.8G  343M  4.2G   8% /var
/dev/mapper/vg_main-lv_varlog
                      4.8G   40M  4.5G   1% /var/log
/dev/mapper/vg_main-lv_varlogaudit
                      4.8G   39M  4.5G   1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
                       79G   56M   75G   1% /elasticsearch

#df -h on sit-2
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
99G 20G 74G 22% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 403M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 255M 7.0G 4% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 11M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 344M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 40M 4.5G 1% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 39M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 8.9G 66G 12% /elasticsearch

Few TRACE logs.

[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [fmJY4Z4ISjmSEX8jbdsJ7A] would have less than the required 5gb free bytes threshold (4428105937 bytes free), preventing allocation
[INFO ][cluster] [sit-master-data-node-0] updating [cluster.info.update.interval] from [1m] to [1m]
[INFO ][cluster.routing.allocation.decider] [sit-master-data-node-0] updating [cluster.routing.allocation.disk.watermark.low] to [80%]
 [INFO ][cluster.routing.allocation.decider] [sit-master-data-node-0] updating [cluster.routing.allocation.disk.watermark.high] to [5gb]
][TRACE][cluster.service] ack received from node [[sit-master-data-node-0][oL29yf7LQI2pxFJy09sYhg][hostname.xyz.fi][inet[/xx.xxx.xx.xx:9300]]{master=true}], cluster_state update (version: 1695)
][TRACE][cluster.service  ] all expected nodes acknowledged cluster_state update (version: 1695)
][DEBUG][cluster.service  ] [sit-master-data-node-0] processing [cluster_update_settings]: done applying updated cluster_state (version: 1695)
[DEBUG][cluster.service ] [sit-master-data-node-0] processing [reroute_after_cluster_update_settings]: execute
  TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[servicepoint-2015-10-13][3], node[null], [R], s[UNASSIGNED]] on node [fmJY4Z4ISjmSEX8jbdsJ7A] due to [ReplicaAfterPrimaryActiveAllocationDecider]

I hope I have given full information.

dakrone · October 28, 2015, 6:16pm

This is missing the logging from the master node, you should see messages from this logging message:

logger.trace("node: [{}], most available: total disk: {}, available disk: {} / least available: total disk: {}, available disk: {}", nodeId, mostAvailablePath.getTotal(), leastAvailablePath.getAvailable(), leastAvailablePath.getTotal(), leastAvailablePath.getAvailable());

Do you have those logs on the master node?

sunilmchaudhari · October 29, 2015, 10:22am

Hi,
I have given few logs below.

[INFO ][cluster.service          ] [sit-master-data-node-0] added {[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx:9300]]{master=false},}, reason: zen-disco-receive(join from node[[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx::9300]]{master=false}])
][DEBUG][cluster.service          ] [sit-master-data-node-0] publishing cluster state version 3167
 [DEBUG][cluster.service          ] [sit-master-data-node-0] set local cluster state to version 3167
[DEBUG][cluster                  ] [sit-master-data-node-0] data node was added, retrieving new cluster info
[TRACE][cluster                  ] [sit-master-data-node-0] Performing ClusterInfoUpdateJob
][DEBUG][cluster.service          ] [sit-master-data-node-0] processing [zen-disco-receive(join from node[[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx::9300]]{master=false}])]: done applying updated cluster_state (version: 3167)
[TRACE][cluster                  ] [sit-master-data-node-0] node: [1WGmqNYBS4SJZUatz-3HTg], total disk: 5051023360, available disk: 4428247040
[TRACE][cluster                  ] [sit-master-data-node-0] node: [oL29yf7LQI2pxFJy09sYhg], total disk: 84413169664, available disk: 64780533760
[TRACE][cluster                  ] [sit-master-data-node-0] shard: [.kibana][0][p] size: 15846
[TRACE][cluster                  ] [sit-master-data-node-0] shard: [ces-2015-10-14][0][p] size: 103966
[TRACE][cluster                  ] [sit-master-data-node-0] shard: [ces-2015-10-15][0][p] size: 58566
TRACE][cluster                  ] [sit-master-data-node-0] shard: [ces-2015-10-17][0][p] size: 29547

[TRACE][cluster.routing.allocation.allocator] [sit-master-data-node-0] Try relocating shard for index index [sales-2015-10-29] from node [oL29yf7LQI2pxFJy09sYhg] to node [EDtnrBZGROiV8TJ00I4wwA]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
][TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426862320 bytes free), preventing allocation
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[sales-2015-10-29][0], node[oL29yf7LQI2pxFJy09sYhg], [R], s[STARTED]] on node [EDtnrBZGROiV8TJ00I4wwA] due to [DiskThresholdDecider]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
][TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426932652 bytes free), preventing allocation
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[sales-2015-10-29][2], node[oL29yf7LQI2pxFJy09sYhg], [R], s[STARTED]] on node [EDtnrBZGROiV8TJ00I4wwA] due to [DiskThresholdDecider]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk

dakrone · October 29, 2015, 8:49pm

Okay, it looks like it collected information about 2 of the nodes:

[TRACE][cluster                  ] [sit-master-data-node-0] node: [1WGmqNYBS4SJZUatz-3HTg], total disk: 5051023360, available disk: 4428247040
[TRACE][cluster                  ] [sit-master-data-node-0] node: [oL29yf7LQI2pxFJy09sYhg], total disk: 84413169664, available disk: 64780533760

However, the EDtnrBZGROiV8TJ00I4wwA node is the actual one having an allocation problem. see:

[TRACE][cluster.routing.allocation.allocator] [sit-master-data-node-0] Try relocating shard for index index [sales-2015-10-29] from node [oL29yf7LQI2pxFJy09sYhg] to node [EDtnrBZGROiV8TJ00I4wwA]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426862320 bytes free), preventing allocation

EDtnrBZGROiV8TJ00I4wwA has 4.1gb of free disk and the limit has been set to 5gb, so it cannot allocate the shard there.

It should have calculated the amount of space for this node also, do you have a log line that looks like:

[TRACE][cluster ] [sit-master-data-node-0] node: [EDtnrBZGROiV8TJ00I4wwA], total disk: NNNNNNN, available disk: MMMMMMM

Where NNNNNNN and MMMMMMM are numbers?

Topic		Replies	Views
Elasticearch reports wrong low disk condition Elasticsearch	2	1847	July 6, 2017
Understanding Disk-based Shard Allocation better Elasticsearch	11	1296	March 11, 2019
Cluster is down on 2 nodes server high disk watermark Elasticsearch	13	624	September 29, 2020
High Disk Watermark exceeded on one or more nodes Elasticsearch	2	1090	July 6, 2017
ElasticSearch 2.3.3 Cluster Disk High/Low Watermark Net Effect is Ambiguous Elasticsearch	6	1060	March 27, 2017

Low disk watermark [15%] exceeded on

df -h on sit-1

Related topics