Cluster is down on 2 nodes server high disk watermark

Dilip_Kumar · August 29, 2020, 9:30pm

Hi All,
I have two node cluster suddenly cluster got down. es version 5.6 and ram size of each node is 16 GB and heap size is given 8GB. I am sharing my nod1 log , please help to find solution..

[2020-08-30T00:30:09,917][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on [JCHa7NT_TBuEGi-5Sy7cpQ][my_prodnode2][/var/lib/elasticsearch/nodes/0] free: 204kb[0%], shards will be relocated away from this node
		

[2020-08-29T10:00:05,125][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][3106046] overhead, spent [268ms] collecting in the last [1s]
[2020-08-29T10:00:06,126][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][3106047] overhead, spent [274ms] collecting in the last [1s]

[2020-08-29T20:29:01,104][DEBUG][o.e.a.b.TransportShardBulkAction] [my_prodnode1] [myregindex1][0] failed to execute bulk item (update) BulkShardRequest [[myregindex1][0]] containing [org.elasticsearch.action.update.UpdateRequest@5f12f396]
org.elasticsearch.index.engine.DocumentMissingException: [znl][E692DFD2-6CB1-4DF6-91E0-82E50325B31B]: document missing

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_241]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_241]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_241]


[2020-08-29T23:50:05,006][WARN ][o.e.c.a.s.ShardStateAction] [my_prodnode1] [esendzlist][1] received shard failed for shard id [[esendzlist][1]], allocation id [DJWCJzJZQ6esChwDe6zZFA], primary term [0], message [shard failure, reason [refresh failed]], failure [IOException[No space left on device]]


[2020-08-29T23:50:12,282][DEBUG][o.e.a.b.TransportShardBulkAction] [my_prodnode1] [esendzlist][1] failed to execute bulk item (index) BulkShardRequest [[esendzlist][1]] containing [index {[esendzlist][ezlist][80_ESEND], source[n/a, actual length: [2.3mb], max length: 2kb]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [zlist]
        at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.6.16.jar:5.6.16]


[2020-08-30T01:03:18,216][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on [JCHa7NT_TBuEGi-5Sy7cpQ][my_prodnode2][/var/lib/elasticsearch/nodes/0] free: 192kb[0%], shards will be relocated away from this node
[2020-08-30T01:03:18,216][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on 

[2020-08-30T01:04:01,405][WARN ][o.e.i.e.Engine           ] [my_prodnode1] [sdkversion][1] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device



[2020-08-30T02:11:40,451][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][22] overhead, spent [325ms] collecting in the last [1s]
[2020-08-30T02:11:50,232][DEBUG][o.e.a.s.TransportSearchAction] [my_prodnode1] All shards failed for phase: [query]
[2020-08-30T02:11:50,233][WARN ][r.suppressed             ] path: /myregindex1/znl/_search, params: {index=myregindex1, type=znl}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

Steve_Mushero · August 30, 2020, 2:27am

Sure looks like you are out of disk space ... only 192KB.

Dilip_Kumar · August 31, 2020, 3:40am

Thanks Steve, but hard disk having enough space , are you talking about ram space, currently ram size is 16 GB and heap is given 8GB, so i need to increase ram?

warkolm · August 31, 2020, 4:11am

Elasticsearch is definitely seeing that there's not much space left.

What does df -h show?

Dilip_Kumar · August 31, 2020, 4:29am

hi warkolm please find below both node df -h status

**node 1:**
Filesystem                 Size  Used Avail Use% Mounted on
udev                       7.9G     0  7.9G   0% /dev
tmpfs                      1.6G  169M  1.4G  11% /run
/dev/sda4                   52G  2.0G   48G   4% /
tmpfs                      7.9G     0  7.9G   0% /dev/shm
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                      7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/sda3                  465M   50M  387M  12% /boot
/dev/sda1                   19G  783M   17G   5% /var
/dev/mapper/vg_u01-lv_u01  492G  8.7G  458G   2% /u01
/dev/sdc                   2.0T   18G  1.9T   1% /nfs
tmpfs                      1.6G     0  1.6G   0% /run/user/1002
tmpfs                      1.6G     0  1.6G   0% /run/user/1003

**Node2**
Filesystem                 Size  Used Avail Use% Mounted on
udev                       7.9G     0  7.9G   0% /dev
tmpfs                      1.6G  169M  1.4G  11% /run
/dev/sda4                   52G  2.0G   48G   4% /
tmpfs                      7.9G     0  7.9G   0% /dev/shm
tmpfs                      5.0M     0  5.0M   0% /run/lock
tmpfs                      7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/sda1                   19G  724M   17G   5% /var
/dev/sda3                  465M   50M  387M  12% /boot
/dev/mapper/vg_u01-lv_u01  492G  8.8G  458G   2% /u01
10.201.201.63:/nfs         2.0T   18G  1.9T   1% /nfs
tmpfs                      1.6G     0  1.6G   0% /run/user/1002
tmpfs                      1.6G     0  1.6G   0% /run/user/1003

warkolm · August 31, 2020, 4:32am

Did you change the default path.data?

Dilip_Kumar · August 31, 2020, 4:32am

also getting

[2020-08-31T05:02:26,732][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97025] overhead, spent [323ms] collecting in the last [1s]
[2020-08-31T05:02:27,743][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97026] overhead, spent [260ms] collecting in the last [1s]
[2020-08-31T05:02:29,756][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97028] overhead, spent [322ms] collecting in the last [1s]
[2020-08-31T05:02:30,756][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97029] overhead, spent [331ms] collecting in the last [1s]
[2020-08-31T05:02:31,759][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97030] overhead, spent [329ms] collecting in the last [1s]
[2020-08-31T05:02:32,769][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97031] overhead, spent [320ms] collecting in the last [1s]

warkolm · August 31, 2020, 4:33am

That is fine, as it says it's an info, not a warn.

Dilip_Kumar · August 31, 2020, 4:54am

no , its default setting , for snapshot taken using nfs. memory server giving about RAm of hard disk?

dominique.bejean · August 31, 2020, 7:56am

Hi,

You are running out of disk space. Fix this issue first.
No space left on device

Regards

Dominique

Steve_Mushero · August 31, 2020, 8:13am

Yeah, but his df -h shows space which is VERY weird.

Could be some temp issue but your /tmp space has space, in fact you have space all over.

Suggest sudo to the ES user and see what it can see, maybe some quota or other weird permission issue, or maybe you are on Docker or have unusual disk setup, even NFS but you are mounted from /dev/sda3 so very odd; I wonder if that's a SAN device or something.

Suggesting making SURE you know your data path and that it has space from that user's perspective.

Dilip_Kumar · August 31, 2020, 9:53am

Thanks Steve_Mushero for your kind attention

 i am not using any docker, space is already there, i think this pace information coming for RAM .. am i right?

Steve_Mushero · September 1, 2020, 3:42am

Suggest trying some of the suggestions like sudo and carefully finding the paths its using, etc. All weird.

system · September 29, 2020, 3:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low disk watermark [15%] exceeded on Elasticsearch	13	5365	July 5, 2017
Single node takes down entire cluster Elasticsearch	5	2318	July 6, 2017
Cluster Health degraded overnight with no apparent reason Elasticsearch	5	1704	July 6, 2017
High Disk Watermark exceeded on one or more nodes Elasticsearch	2	1087	July 6, 2017
Non-Uniform Drive Space Across Nodes Elasticsearch	6	1614	July 6, 2017

Cluster is down on 2 nodes server high disk watermark

Related topics