Low Watermark disk space

ek9boy · April 17, 2019, 9:29am

I keep getting the error of low watermark disk space.

i checked my ES config file. There are currently no settings.

I have 220 TB Free,

What are the best settings to apply?

DavidTurner · April 17, 2019, 11:16am

220TB?! That's a lot of space. Or a typo. Can you confirm this is correct?

Can you share the log messages about the low disk space? They will contain the details needed to help answer your question.

ek9boy · April 24, 2019, 8:20am

Hi Please see below output as i cant attach an image

[2019-04-24T09:14:52,175][WARN ][o.e.c.r.a.DiskThresholdMonitor] [RiaPHCU] high
disk watermark [90%] exceeded on [RiaPHCU-TXem5XQoSvvXXA][RiaPHCU][H:\ELK\elasti
csearch-6.3.0\data\nodes\0] free: 205.5tb[9.3%], shards will be relocated away f
rom this node
[2019-04-24T09:14:52,190][INFO ][o.e.c.r.a.DiskThresholdMonitor] [RiaPHCU] rerou
ting shards: [high disk watermark exceeded on one or more nodes]
[2019-04-24T09:15:23,407][WARN ][o.e.c.r.a.DiskThresholdMonitor] [RiaPHCU] high
disk watermark [90%] exceeded on [RiaPHCU-TXem5XQoSvvXXA][RiaPHCU][H:\ELK\elasti
csearch-6.3.0\data\nodes\0] free: 205.5tb[9.3%], shards will be relocated away f
rom this node
[2019-04-24T09:15:55,309][WARN ][o.e.c.r.a.DiskThresholdMonitor] [RiaPHCU] high
disk watermark [90%] exceeded on [RiaPHCU-TXem5XQoSvvXXA][RiaPHCU][H:\ELK\elasti
csearch-6.3.0\data\nodes\0] free: 205.5tb[9.3%], shards will be relocated away f
rom this node
[2019-04-24T09:15:55,325][INFO ][o.e.c.r.a.DiskThresholdMonitor] [RiaPHCU] rerou
ting shards: [high disk watermark exceeded on one or more nodes]

in terms of memory yes thats correct..when i do dir in my H drive, the free memory stated is: 226,025,176,737,792 bytes free

ek9boy · April 24, 2019, 8:23am

@DavidTurner please see above reply.

Best way to fix this issue? Is there any config required? To help the high disk watermark threshold? As with this error elasticsearch crashes, kibana cannot connect, logstash cannot run and ingest data

DavidTurner · April 24, 2019, 8:36am

Ok, just to be clear, this seems to be saying that you have a single filesystem with over 2PB of space on it. Is that right?

By default Elasticsearch becomes unhappy when a filesystem is over 85% full, with increasing levels of unhappiness at 90% and 95%. These levels are reasonable in many cases, but it's pretty unusual to have a multi-petabyte filesystem. You can adjust these levels using the settings for the disk-based shard allocator.

Yet, I do not think these watermarks will cause Elasticsearch to crash as you claim. Can you share the log messages from such a crash?

ek9boy · April 24, 2019, 8:50am

@DavidTurner yes that is correct.

Thanks ill take a look into adding cluster threshold settings in elasticsearch.yml. Something like below should help.

"cluster.routing.allocation.disk.watermark.low": "100gb",
    "cluster.routing.allocation.disk.watermark.high": "50gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
    "cluster.info.update.interval": "1m"

2nd point while i was running ES, Kibana gave me this Error:

 log   [08:44:06.386] [warning][kibana-monitoring][monitoring-ui] Unable to fet
ch data from kibana_settings collector
 error  [08:44:06.402] [warning][kibana-monitoring][monitoring-ui] [search_phase
_execution_exception] all shards failed :: {"path":"/.kibana/_search","query":{"
ignore_unavailable":true,"filter_path":"aggregations.types.buckets"},"body":"{\"
size\":0,\"query\":{\"terms\":{\"type\":[\"dashboard\",\"visualization\",\"searc
h\",\"index-pattern\",\"graph-workspace\",\"timelion-sheet\"]}},\"aggs\":{\"type
s\":{\"terms\":{\"field\":\"type\",\"size\":6}}}}","statusCode":503,"response":"
{\"error\":{\"root_cause\":[],\"type\":\"search_phase_execution_exception\",\"re
ason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shard
s\":[]},\"status\":503}"}
    at respond (H:\ELK\kibana-6.3.0-windows-x86_64\node_modules\elasticsearch\sr
c\lib\transport.js:307:15)
    at checkRespForFailure (H:\ELK\kibana-6.3.0-windows-x86_64\node_modules\elas
ticsearch\src\lib\transport.js:266:7)
    at HttpConnector.<anonymous> (H:\ELK\kibana-6.3.0-windows-x86_64\node_module
s\elasticsearch\src\lib\connectors\http.js:159:7)
    at IncomingMessage.bound (H:\ELK\kibana-6.3.0-windows-x86_64\node_modules\el
asticsearch\node_modules\lodash\dist\lodash.js:729:21)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  log   [08:44:06.417] [warning][kibana-monitoring][monitoring-ui] Unable to fet
ch data from kibana collector

Is this related to watermark issue? As ES is reporting collector timed out when collecting?

DavidTurner · April 24, 2019, 8:58am

Those values will indeed make the warnings go away. I can't say if they're too small, it depends on how quickly you're ingesting data. If Elasticsearch hits the high watermark then it needs enough time to move shards around before hitting the flood stage watermark.

No, that's not something that you'd get from hitting a watermark, but the Kibana logs don't really tell us much except all shards failed. Did Elasticsearch log anything more useful at this time?

ek9boy · April 24, 2019, 9:41am

So the ES output for the Kibana error is:

 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChanne

lPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(Abstra
ctNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.jav
a:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLo
op.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.ja
va:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-t
ransport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThread
EventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2019-04-24T10:40:14,234][WARN ][r.suppressed ] path: /.kibana/_sear
ch, params: {ignore_unavailable=true, index=.kibana, filter_path=aggregations.ty
pes.buckets}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFail

ure(AbstractSearchAsyncAction.java:288) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNext
Phase(AbstractSearchAsyncAction.java:128) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone
(AbstractSearchAsyncAction.java:249) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(Ini
tialSearchPhase.java:101) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhas
eOnShard$1(InitialSearchPhase.java:210) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSea
rchPhase.java:189) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreserv
ingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0
]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRun
nable.java:41) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

DavidTurner · April 24, 2019, 9:55am

I'm not sure that's what you meant to post, it is very strangely formatted and seems to start in the middle of a stack trace. Could you fix up your message or try again? Use the </> button to format logs properly and check the preview window to make sure it looks correct before posting.

ek9boy · April 24, 2019, 10:44am

Hi, Sorry for that, i am receiving different error messages which i dont understand: Please see below hope this makes sense.The out is from ES when Kibana is giving the same error under our 2nd point discussion. only can provide last section of the output from ES due to count limit

at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRun
nable.java:41) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2019-04-24T11:34:44,923][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [RiaPHCU] co
llector [cluster_stats] timed out when collecting data
[2019-04-24T11:34:49,555][DEBUG][o.e.a.s.TransportSearchAction] [RiaPHCU] All sh
ards failed for phase: [query]
[2019-04-24T11:34:49,562][WARN ][r.suppressed             ] path: /.kibana/_sear
ch, params: {size=10000, index=.kibana, from=0}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFail
ure(AbstractSearchAsyncAction.java:288) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNext
Phase(AbstractSearchAsyncAction.java:128) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone
(AbstractSearchAsyncAction.java:249) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(Ini
tialSearchPhase.java:101) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhas
eOnShard$1(InitialSearchPhase.java:210) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSea
rchPhase.java:189) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreserv
ingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0
]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRun
nable.java:41) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2019-04-24T11:34:55,022][ERROR][o.e.x.m.c.n.NodeStatsCollector] [RiaPHCU] colle
ctor [node_stats] timed out when collecting data
[2019-04-24T11:35:14,932][ERROR][o.e.x.m.c.i.IndexStatsCollector] [RiaPHCU] coll
ector [index-stats] timed out when collecting data
[2019-04-24T11:35:24,949][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [RiaPHCU] co
llector [cluster_stats] timed out when collecting data

DavidTurner · April 24, 2019, 10:57am

Ok, there's lots of indication that something is wrong in your cluster, and it doesn't look like it's anything to do with disk watermarks. But these messages are still only a tiny part of the picture, covering less than a minute of elapsed time, and have funny line breaks and truncated stack traces. Are you copying them from a console? Can you share the actual log file instead?

system · May 22, 2019, 10:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELK It's broken, rerouting shards:high disk watermark exceeded on one or more nodes Elasticsearch	7	4950	August 19, 2018
High disk watermark exceeded on one or more nodes, rerouting shards Elasticsearch	17	7549	July 5, 2017
High disk watermark in elastcisearch Elasticsearch	4	8973	July 6, 2017
Low disk watermark [15%] exceeded on Elasticsearch	13	5351	July 5, 2017
Interesting Issue Elasticsearch	7	436	July 6, 2017

Low Watermark disk space

Related topics