Hi Everyone,
I am almost stumped now. My elasticsearch went dead and all shards were in unassigned state. We started the services back and when the shards started getting relocated things were fine upto certain point but it crashes after an hour or two with the following error.
[2017-08-01T15:28:43,733][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17873][562] duration [24.7s], collections [1]/[24.8s], total [24.7s]/[7.7m], memory [9.9gb]->[9.7gb]/[9.9gb], all_pools {[young] [532.5mb]->[379.2mb]/[532.5mb]}{[survivor] [59.2mb]->[0b]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:28:43,734][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17873] overhead, spent [24.7s] collecting in the last [24.8s]
[2017-08-01T15:28:44,734][INFO ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17874] overhead, spent [435ms] collecting in the last [1s]
[2017-08-01T15:29:42,373][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17876][564] duration [25s], collections [1]/[25.8s], total [25s]/[8.6m], memory [9.7gb]->[9.8gb]/[9.9gb], all_pools {[young] [425.9mb]->[473.7mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:29:42,373][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17876] overhead, spent [25s] collecting in the last [25.8s]
[2017-08-01T15:30:14,533][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17877][565] duration [31.4s], collections [1]/[32.1s], total [31.4s]/[9.1m], memory [9.8gb]->[9.8gb]/[9.9gb], all_pools {[young] [473.7mb]->[508.9mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:30:14,534][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17877] overhead, spent [31.4s] collecting in the last [32.1s]
[2017-08-01T15:30:38,120][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17878][566] duration [22.9s], collections [1]/[23.1s], total [22.9s]/[9.5m], memory [9.8gb]->[9.8gb]/[9.9gb], all_pools {[young] [508.9mb]->[532.5mb]/[532.5mb]}{[survivor] [0b]->[622.3kb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:30:38,120][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17878] overhead, spent [22.9s] collecting in the last [23.1s]
[2017-08-01T15:31:10,660][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17879][567] duration [31.8s], collections [1]/[32.9s], total [31.8s]/[10m], memory [9.8gb]->[9.8gb]/[9.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [622.3kb]->[30.3mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:31:10,660][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17879] overhead, spent [31.8s] collecting in the last [32.9s]
[2017-08-01T15:31:36,182][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][old][17880][568] duration [25.1s], collections [1]/[25.5s], total [25.1s]/[10.5m], memory [9.8gb]->[9.9gb]/[9.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [30.3mb]->[53.2mb]/[66.5mb]}{[old] [9.3gb]->[9.3gb]/[9.3gb]}
[2017-08-01T15:31:36,182][WARN ][o.e.m.j.JvmGcMonitorService] [xx.xx.xx.xx] [gc][17880] overhead, spent [25.1s] collecting in the last [25.5s]
[2017-08-01T15:37:27,404][ERROR][o.e.t.n.Netty4Utils ] fatal error on the network layer
at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:140)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:851)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:527)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:745)
OS:- Ubuntu 14.04
Java: 1.8
Heap Space:- 10G
Node Type:- Data
I have no clue what's going wrong.