Importing data (bulkrequest) - fatal error on the network layer / GC overhead limit exceeded

blommis · April 25, 2017, 7:23am

Version 5.2.0
Running on ubuntu
Memory min/max
Master 1gb/1gb
Data 2gb/2gb

In the latest days, I have had issues with getting "fatal error on network layer" and "GC overhead limit" exceptions when inserting data to ES.

What is the best way to find out why this occurs?

I'm inserting about 1 million records every day, in 5 bulkrequests, sequentially, with the following setup

        .setBulkActions(10000)
        .setBulkSize(new ByteSizeValue(5, ByteSizeUnit.MB))
        .setFlushInterval(TimeValue.timeValueSeconds(5))
        .setConcurrentRequests(1)
        .setBackoffPolicy(BackoffPolicy.exponentialBackoff(TimeValue.timeValueMillis(100), 3))
        .build();

Todays log:

2017-04-25 09:03:44,608 [elasticsearch[_client_][transport_client_boss][T#3]] WARN  o.e.t.TransportService Korrelasjonsid= Received response for a request that has timed out, sent [13601ms] ago, timed out [7917ms] ago, action [cluster:monitor/nodes/liveness], node [{#transport#-1}{ljIxQw1tQ_Sr7EFRL2Qa7g}{...], id [27529]
2017-04-25 09:03:59,449 [elasticsearch[_client_][generic][T#3]] INFO  o.e.c.t.TransportClientNodesService Korrelasjonsid= failed to get node info for {#transport#-1}{ljIxQw1tQ_Sr7EFRL2Qa7g}{...}{...}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][....][cluster:monitor/nodes/liveness] request_id [27529] timed out after [5684ms]
    at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:908)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:527)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Yesterdays log

2017-04-24 14:27:28,064 [elasticsearch[_client_][transport_client_boss][T#1]] ERROR o.e.t.n.Netty4Utils Korrelasjonsid= fatal error on the network layer
	at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:140)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:851)
	....
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:527)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:745)
2017-04-24 14:27:30,019 [elasticsearch[_client_][generic][T#1]] INFO  o.e.c.t.TransportClientNodesService Korrelasjonsid= failed to get node info for {#transport#-1}{Ayn44paGRqmiq27m_zwKVQ}{tsl0sofus-at-oppdrag01
}{10.1.5.152:12500}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][10.1.5.152:12500][cluster:monitor/nodes/liveness] request_id [248] timed out after [47400ms]
	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:908)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:527)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

2017-04-24 14:27:30,020 [elasticsearch[_client_][transport_client_boss][T#1]] WARN  o.e.t.n.Netty4Transport Korrelasjonsid= exception caught on transport layer [[id: 0xe86b4f6e, L:/10.1.5.152:52010 - R:tsl0sofus-
at-oppdrag02/10.1.13.77:12500]], closing connection
org.elasticsearch.ElasticsearchException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.elasticsearch.transport.netty4.Netty4Transport.exceptionCaught(Netty4Transport.java:332)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:84)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:286)
            ...
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at org.apache.lucene.util.CharsRef.toString(CharsRef.java:154)
	at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:373)
	at org.elasticsearch.index.Index.<init>(Index.java:63)
	at org.elasticsearch.index.shard.ShardId.readFrom(ShardId.java:101)
	at org.elasticsearch.index.shard.ShardId.readShardId(ShardId.java:95)
	at org.elasticsearch.action.DocWriteResponse.readFrom(DocWriteResponse.java:208)
	at org.elasticsearch.action.bulk.BulkItemResponse.readFrom(BulkItemResponse.java:307)
	at org.elasticsearch.action.bulk.BulkItemResponse.readBulkItem(BulkItemResponse.java:296)
	at org.elasticsearch.action.bulk.BulkResponse.readFrom(BulkResponse.java:128)
	at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1372)
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1347)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)

jprante · April 25, 2017, 7:55pm

If you want large bulk requests such as 10000 documents or 5MB size, simple methods to cope with that is to increase heap size, or add more data nodes.

If you can't, you could try decrease documents per bulk request, or increase concurrency in bulk, until your cluster is capable of processing. Also you could check to simplify the mapping if it is heavy, by turning off indexing for as many fields as possible, and by avoiding dynamic mapping at all.

blommis · April 27, 2017, 5:56am

I actually solved it by increasing the heap on the client that is doing the bulkrequests. And it went away.

system · May 25, 2017, 6:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.