I faced "fatal error on the network layer" and "fatal error in thread" error and Elasticsearch process is die.
Error log is following:
[2017-10-31T09:04:09,689][ERROR][o.e.t.n.Netty4Utils ] fatal error on the network layer
at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:179)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
[2017-10-31T09:04:09,703][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [host] fatal error in thread [elasticsearch[host][search][T#4]], exiting
java.lang.StackOverflowError: null
at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
<<The rest is omitted>>
Full of logs is on gist: https://gist.github.com/moznion/d6727e00a06467b053d941a74b1c745e
Situation
When access to Kibana's discover page, connected Elasticsearch node dies with errors (and banner is appear: "Discover: socket hang up").
Elasticsearch version
5.6.2
JVM Options
-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.allocator.type=unpooled
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintClassHistogram
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-Xloggc:{{ ELASTICSEARCH.LOG_DIR }}/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32
-XX:GCLogFileSize=128M
Elasticsearch Options
cluster.name: "cluster"
node.name: ${HOSTNAME}
bootstrap.memory_lock: true
indices.fielddata.cache.size: '80%'
network:
host: ["_local_", "_global_"]
discovery:
zen:
ping.unicast.hosts:
- host1
- host2
- host3
minimum_master_nodes: 2
Metrics
- Nodes: 3
- Indices: 4166
- Memory: 14GB / 24GB
- Total Shards: 8364
- Unassigned Shards: 0
- Documents: 83,605,042
- Data: 71GB
And cluster has the single master.
Question
Is there any workaround to avoid this problem?