Hi All,
I have two node cluster suddenly cluster got down. es version 5.6 and ram size of each node is 16 GB and heap size is given 8GB. I am sharing my nod1 log , please help to find solution..
[2020-08-30T00:30:09,917][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on [JCHa7NT_TBuEGi-5Sy7cpQ][my_prodnode2][/var/lib/elasticsearch/nodes/0] free: 204kb[0%], shards will be relocated away from this node
[2020-08-29T10:00:05,125][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][3106046] overhead, spent [268ms] collecting in the last [1s]
[2020-08-29T10:00:06,126][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][3106047] overhead, spent [274ms] collecting in the last [1s]
[2020-08-29T20:29:01,104][DEBUG][o.e.a.b.TransportShardBulkAction] [my_prodnode1] [myregindex1][0] failed to execute bulk item (update) BulkShardRequest [[myregindex1][0]] containing [org.elasticsearch.action.update.UpdateRequest@5f12f396]
org.elasticsearch.index.engine.DocumentMissingException: [znl][E692DFD2-6CB1-4DF6-91E0-82E50325B31B]: document missing
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_241]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_241]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_241]
[2020-08-29T23:50:05,006][WARN ][o.e.c.a.s.ShardStateAction] [my_prodnode1] [esendzlist][1] received shard failed for shard id [[esendzlist][1]], allocation id [DJWCJzJZQ6esChwDe6zZFA], primary term [0], message [shard failure, reason [refresh failed]], failure [IOException[No space left on device]]
[2020-08-29T23:50:12,282][DEBUG][o.e.a.b.TransportShardBulkAction] [my_prodnode1] [esendzlist][1] failed to execute bulk item (index) BulkShardRequest [[esendzlist][1]] containing [index {[esendzlist][ezlist][80_ESEND], source[n/a, actual length: [2.3mb], max length: 2kb]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [zlist]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:298) ~[elasticsearch-5.6.16.jar:5.6.16]
[2020-08-30T01:03:18,216][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on [JCHa7NT_TBuEGi-5Sy7cpQ][my_prodnode2][/var/lib/elasticsearch/nodes/0] free: 192kb[0%], shards will be relocated away from this node
[2020-08-30T01:03:18,216][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_prodnode1] high disk watermark [90%] exceeded on
[2020-08-30T01:04:01,405][WARN ][o.e.i.e.Engine ] [my_prodnode1] [sdkversion][1] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
[2020-08-30T02:11:40,451][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode1] [gc][22] overhead, spent [325ms] collecting in the last [1s]
[2020-08-30T02:11:50,232][DEBUG][o.e.a.s.TransportSearchAction] [my_prodnode1] All shards failed for phase: [query]
[2020-08-30T02:11:50,233][WARN ][r.suppressed ] path: /myregindex1/znl/_search, params: {index=myregindex1, type=znl}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
Thanks Steve, but hard disk having enough space , are you talking about ram space, currently ram size is 16 GB and heap is given 8GB, so i need to increase ram?
[2020-08-31T05:02:26,732][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97025] overhead, spent [323ms] collecting in the last [1s]
[2020-08-31T05:02:27,743][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97026] overhead, spent [260ms] collecting in the last [1s]
[2020-08-31T05:02:29,756][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97028] overhead, spent [322ms] collecting in the last [1s]
[2020-08-31T05:02:30,756][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97029] overhead, spent [331ms] collecting in the last [1s]
[2020-08-31T05:02:31,759][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97030] overhead, spent [329ms] collecting in the last [1s]
[2020-08-31T05:02:32,769][INFO ][o.e.m.j.JvmGcMonitorService] [my_prodnode2] [gc][97031] overhead, spent [320ms] collecting in the last [1s]
Yeah, but his df -h shows space which is VERY weird.
Could be some temp issue but your /tmp space has space, in fact you have space all over.
Suggest sudo to the ES user and see what it can see, maybe some quota or other weird permission issue, or maybe you are on Docker or have unusual disk setup, even NFS but you are mounted from /dev/sda3 so very odd; I wonder if that's a SAN device or something.
Suggesting making SURE you know your data path and that it has space from that user's perspective.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.