Hi,
I have 3 nodes cluster in our environment. Each node has data.path where 70GB space is available.
Still ES is showing "low disk watermark [15%] exceeded on".
Can anybody explain me why its that?
br,
Sunil Chaudhari.
Hi,
I have 3 nodes cluster in our environment. Each node has data.path where 70GB space is available.
Still ES is showing "low disk watermark [15%] exceeded on".
Can anybody explain me why its that?
br,
Sunil Chaudhari.
You have less than 15% of the total disk space remaining free.
You can change this settings to an absolute value or change the percentage.
Look at https://www.elastic.co/guide/en/elasticsearch/reference/current/disk.html
Hi,
but look at below: elasticsearch is the partition where data files are located. and its 20% used only.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
9.6G 3.7G 5.5G 41% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 403M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 733M 6.5G 10% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 11M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 346M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 142M 4.4G 4% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 36M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 15G 61G 20% /elasticsearch
Interesting. @dakrone do you have an idea?
Can you enable TRACE logging for the cluster
package on the master node for a little bit? It will log all of the collected disk stats about each of the nodes.
You should be able to with:
PUT /_cluster/settings
{
"transient": {
"logger.cluster": "TRACE"
}
}
Also, can you collect the output of df -h
on all of the data nodes so I can correlate the reported vs actual disk?
Also, what version are you on?
Hi @dakrone,
Do I need to restart ES after enabling TRACE log via PUT command?
You do not.
Hi, @warkolm, @dakrone ,
below is consolidated information from my cluster.
ES version 1.5.2
3 Nodes on multiple hosts given below.
#df -h on sit-0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
9.6G 3.7G 5.5G 41% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 403M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 733M 6.5G 10% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 11M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 346M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 623M 3.9G 14% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 36M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 15G 61G 20% /elasticsearch
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
99G 32G 62G 34% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 402M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 488M 6.8G 7% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 9.9M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 343M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 40M 4.5G 1% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 39M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 56M 75G 1% /elasticsearch
#df -h on sit-2
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_main-lv_root
99G 20G 74G 22% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/mapper/vg_main-lv_backup
4.8G 9.9M 4.5G 1% /backup
/dev/sda1 488M 61M 403M 14% /boot
/dev/mapper/vg_main-lv_home
7.6G 255M 7.0G 4% /home
/dev/mapper/vg_main-lv_log
9.6G 23M 9.1G 1% /log
/dev/mapper/vg_main-lv_tmp
4.8G 11M 4.5G 1% /tmp
/dev/mapper/vg_main-lv_var
4.8G 344M 4.2G 8% /var
/dev/mapper/vg_main-lv_varlog
4.8G 40M 4.5G 1% /var/log
/dev/mapper/vg_main-lv_varlogaudit
4.8G 39M 4.5G 1% /var/log/audit
/dev/mapper/vg_data-lv_elasticsearch
79G 8.9G 66G 12% /elasticsearch
Few TRACE logs.
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [fmJY4Z4ISjmSEX8jbdsJ7A] would have less than the required 5gb free bytes threshold (4428105937 bytes free), preventing allocation
[INFO ][cluster] [sit-master-data-node-0] updating [cluster.info.update.interval] from [1m] to [1m]
[INFO ][cluster.routing.allocation.decider] [sit-master-data-node-0] updating [cluster.routing.allocation.disk.watermark.low] to [80%]
[INFO ][cluster.routing.allocation.decider] [sit-master-data-node-0] updating [cluster.routing.allocation.disk.watermark.high] to [5gb]
][TRACE][cluster.service] ack received from node [[sit-master-data-node-0][oL29yf7LQI2pxFJy09sYhg][hostname.xyz.fi][inet[/xx.xxx.xx.xx:9300]]{master=true}], cluster_state update (version: 1695)
][TRACE][cluster.service ] all expected nodes acknowledged cluster_state update (version: 1695)
][DEBUG][cluster.service ] [sit-master-data-node-0] processing [cluster_update_settings]: done applying updated cluster_state (version: 1695)
[DEBUG][cluster.service ] [sit-master-data-node-0] processing [reroute_after_cluster_update_settings]: execute
TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[servicepoint-2015-10-13][3], node[null], [R], s[UNASSIGNED]] on node [fmJY4Z4ISjmSEX8jbdsJ7A] due to [ReplicaAfterPrimaryActiveAllocationDecider]
I hope I have given full information.
This is missing the logging from the master node, you should see messages from this logging message:
logger.trace("node: [{}], most available: total disk: {}, available disk: {} / least available: total disk: {}, available disk: {}", nodeId, mostAvailablePath.getTotal(), leastAvailablePath.getAvailable(), leastAvailablePath.getTotal(), leastAvailablePath.getAvailable());
Do you have those logs on the master node?
Hi,
I have given few logs below.
[INFO ][cluster.service ] [sit-master-data-node-0] added {[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx:9300]]{master=false},}, reason: zen-disco-receive(join from node[[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx::9300]]{master=false}])
][DEBUG][cluster.service ] [sit-master-data-node-0] publishing cluster state version 3167
[DEBUG][cluster.service ] [sit-master-data-node-0] set local cluster state to version 3167
[DEBUG][cluster ] [sit-master-data-node-0] data node was added, retrieving new cluster info
[TRACE][cluster ] [sit-master-data-node-0] Performing ClusterInfoUpdateJob
][DEBUG][cluster.service ] [sit-master-data-node-0] processing [zen-disco-receive(join from node[[sit-data-node-1][1WGmqNYBS4SJZUatz-3HTg][lus00080.lij.fi][inet[/xx.xxx.x.xx::9300]]{master=false}])]: done applying updated cluster_state (version: 3167)
[TRACE][cluster ] [sit-master-data-node-0] node: [1WGmqNYBS4SJZUatz-3HTg], total disk: 5051023360, available disk: 4428247040
[TRACE][cluster ] [sit-master-data-node-0] node: [oL29yf7LQI2pxFJy09sYhg], total disk: 84413169664, available disk: 64780533760
[TRACE][cluster ] [sit-master-data-node-0] shard: [.kibana][0][p] size: 15846
[TRACE][cluster ] [sit-master-data-node-0] shard: [ces-2015-10-14][0][p] size: 103966
[TRACE][cluster ] [sit-master-data-node-0] shard: [ces-2015-10-15][0][p] size: 58566
TRACE][cluster ] [sit-master-data-node-0] shard: [ces-2015-10-17][0][p] size: 29547
[TRACE][cluster.routing.allocation.allocator] [sit-master-data-node-0] Try relocating shard for index index [sales-2015-10-29] from node [oL29yf7LQI2pxFJy09sYhg] to node [EDtnrBZGROiV8TJ00I4wwA]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
][TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426862320 bytes free), preventing allocation
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[sales-2015-10-29][0], node[oL29yf7LQI2pxFJy09sYhg], [R], s[STARTED]] on node [EDtnrBZGROiV8TJ00I4wwA] due to [DiskThresholdDecider]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
][TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426932652 bytes free), preventing allocation
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Can not allocate [[sales-2015-10-29][2], node[oL29yf7LQI2pxFJy09sYhg], [R], s[STARTED]] on node [EDtnrBZGROiV8TJ00I4wwA] due to [DiskThresholdDecider]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
Okay, it looks like it collected information about 2 of the nodes:
[TRACE][cluster ] [sit-master-data-node-0] node: [1WGmqNYBS4SJZUatz-3HTg], total disk: 5051023360, available disk: 4428247040
[TRACE][cluster ] [sit-master-data-node-0] node: [oL29yf7LQI2pxFJy09sYhg], total disk: 84413169664, available disk: 64780533760
However, the EDtnrBZGROiV8TJ00I4wwA
node is the actual one having an allocation problem. see:
[TRACE][cluster.routing.allocation.allocator] [sit-master-data-node-0] Try relocating shard for index index [sales-2015-10-29] from node [oL29yf7LQI2pxFJy09sYhg] to node [EDtnrBZGROiV8TJ00I4wwA]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage without relocations: [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] usage with relocations: [0 bytes] [EDtnrBZGROiV8TJ00I4wwA][sit-data-node-1] free: 4.1gb[87.6%]
[TRACE][cluster.routing.allocation.decider] [sit-master-data-node-0] Node [EDtnrBZGROiV8TJ00I4wwA] has 87.67037529598754% free disk
[WARN ][cluster.routing.allocation.decider] [sit-master-data-node-0] After allocating, node [EDtnrBZGROiV8TJ00I4wwA] would have less than the required 5gb free bytes threshold (4426862320 bytes free), preventing allocation
EDtnrBZGROiV8TJ00I4wwA
has 4.1gb of free disk and the limit has been set to 5gb, so it cannot allocate the shard there.
It should have calculated the amount of space for this node also, do you have a log line that looks like:
[TRACE][cluster ] [sit-master-data-node-0] node: [EDtnrBZGROiV8TJ00I4wwA], total disk: NNNNNNN, available disk: MMMMMMM
Where NNNNNNN and MMMMMMM are numbers?
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.