Faced "fatal error on the network layer" and "fatal error in thread" error

moznion · October 31, 2017, 12:14pm

I faced "fatal error on the network layer" and "fatal error in thread" error and Elasticsearch process is die.

Error log is following:

[2017-10-31T09:04:09,689][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
        at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:179)
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
        at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.lang.Thread.run(Thread.java:748)
[2017-10-31T09:04:09,703][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [host] fatal error in thread [elasticsearch[host][search][T#4]], exiting
java.lang.StackOverflowError: null
        at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
<<The rest is omitted>>

Full of logs is on gist: https://gist.github.com/moznion/d6727e00a06467b053d941a74b1c745e

Situation

When access to Kibana's discover page, connected Elasticsearch node dies with errors (and banner is appear: "Discover: socket hang up").

Elasticsearch version

5.6.2

JVM Options

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.allocator.type=unpooled
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintClassHistogram
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-Xloggc:{{ ELASTICSEARCH.LOG_DIR }}/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32
-XX:GCLogFileSize=128M

Elasticsearch Options

cluster.name: "cluster"
node.name: ${HOSTNAME}
bootstrap.memory_lock: true
indices.fielddata.cache.size: '80%'
network:
  host: ["_local_", "_global_"]
discovery:
  zen:
    ping.unicast.hosts:
      - host1
      - host2
      - host3
    minimum_master_nodes: 2

Metrics

Nodes: 3
Indices: 4166
Memory: 14GB / 24GB
Total Shards: 8364
Unassigned Shards: 0
Documents: 83,605,042
Data: 71GB

And cluster has the single master.

Question

Is there any workaround to avoid this problem?

jasontedor · October 31, 2017, 5:32pm

I'm sorry to tell you that this is a known issue and will be fixed in 5.6.4. Details are here: #27609

moznion · November 1, 2017, 12:59am

Thank you for your inform.
Could you let me know the schedule to release 5.6.4 if you know?

jasontedor · November 1, 2017, 1:14am

As a matter of policy, we do not publish release dates, even indicative, even with weasel words about how they could change. Sorry!

moznion · November 1, 2017, 1:25am

Ok, I see.
Thank you for your support.

system · November 29, 2017, 1:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ERROR org.elasticsearch.transport.netty4.Netty4Utils - fatal error on the network layer Elasticsearch	1	1338	January 11, 2018
"fatal error on the network layer" after upgrading to 5.5 Elasticsearch	1	1196	August 9, 2017
ElasticSearch crashes with 5.3.1 Client Elasticsearch	8	2380	June 16, 2017
Es node suddenly OOM with "fatal error on the network layer" Elasticsearch	3	765	May 19, 2019
Netty4Utils fatal error on the network layer Elasticsearch	2	4654	October 24, 2018