Faced "fatal error on the network layer" and "fatal error in thread" error

I faced "fatal error on the network layer" and "fatal error in thread" error and Elasticsearch process is die.

Error log is following:

[2017-10-31T09:04:09,689][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
        at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:179)
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
        at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.lang.Thread.run(Thread.java:748)
[2017-10-31T09:04:09,703][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [host] fatal error in thread [elasticsearch[host][search][T#4]], exiting
java.lang.StackOverflowError: null
        at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.skipShard(InitialSearchPhase.java:323) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.skipShard(AbstractSearchAsyncAction.java:321) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeExecuteNext(InitialSearchPhase.java:147) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:207) ~[elasticsearch-5.6.2.jar:5.6.2]
<<The rest is omitted>>

Full of logs is on gist: https://gist.github.com/moznion/d6727e00a06467b053d941a74b1c745e

Situation

When access to Kibana's discover page, connected Elasticsearch node dies with errors (and banner is appear: "Discover: socket hang up").

Elasticsearch version

5.6.2

JVM Options

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.allocator.type=unpooled
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintClassHistogram
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-Xloggc:{{ ELASTICSEARCH.LOG_DIR }}/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32
-XX:GCLogFileSize=128M

Elasticsearch Options

cluster.name: "cluster"
node.name: ${HOSTNAME}
bootstrap.memory_lock: true
indices.fielddata.cache.size: '80%'
network:
  host: ["_local_", "_global_"]
discovery:
  zen:
    ping.unicast.hosts:
      - host1
      - host2
      - host3
    minimum_master_nodes: 2

Metrics

  • Nodes: 3
  • Indices: 4166
  • Memory: 14GB / 24GB
  • Total Shards: 8364
  • Unassigned Shards: 0
  • Documents: 83,605,042
  • Data: 71GB

And cluster has the single master.

Question

Is there any workaround to avoid this problem?

I'm sorry to tell you that this is a known issue and will be fixed in 5.6.4. Details are here: #27609

Thank you for your inform.
Could you let me know the schedule to release 5.6.4 if you know?

As a matter of policy, we do not publish release dates, even indicative, even with weasel words about how they could change. Sorry!

Ok, I see.
Thank you for your support.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.