Relationship between heap-size and shard-allocation - ES 7.0.x

dinesh_gnanasamy · March 1, 2020, 2:14pm

Hi,
I am trying to size a production cluster based on dev-setup where I encountered Data too large .. [transport_request] (CircuitBreakingException) on [parent] CircuitBreaker. Since retries are also exhausted, some shards get stuck in UNASSIGNED state and cluster-state becomes RED.

I tried recovering the cluster using reroute but the it leaves the cluster in YELLOW state again hitting the CircuitBreakingException

Checked few discussion threads in this context and most of the recommendations were around increasing heap size per instance as well as using G1GC.

My cluster config is as follows:

Environment: Kubernetes
Elasticsearch version : 7.0.1
Cluster setup : 3 physical nodes with 4 instances of ES per node (each instance running as a container within dedicated Kubernetes Pod)
Total Data nodes : 12 data nodes

Heap allocated per ES Instance Container : 14GB
Total memory allocated per ES instance (docker-container) : 28GB

Inter-nodal link bandwidth across physical nodes : 10Gbps

Largest shard size per instance : 11GB
Next Largest shard size instance: 8GB

Test Scenario:
For testing purposes, I have suppressed indexing traffic as well as querying traffic leaving all other configurations intact and bounced just one instance of ES out of 12.

Issue Reproduction
As a reproduction in small scale, I created a small cluster - 3 ES instances running in docker containers and created one index with 5 shards - with 5 primaries and 5 replicas and loaded the same with dummy data upto 95MB shard-size. Heap-size allocated per instance was 512MB.

Observations during reproduction of issue
Even in this small reproduction setup, I was able to see [parent] breaker tripping at least 2-3 times (and of course it recovered) in repeated testing. It was not as bad as the scaled cluster. But still same exception could be seen with repeated bouncing of ES instances.

My clarifications regarding the reproduction:

On-disk shard-size is 95MB. In worst case even if 2 shards get allocated in peer-recovery mode concurrently (indices.recovery.max_concurrent_file_chunks), heap-memory requirement would at the most be ~200MB total - please correct me if I am wrong. What are the other heap requirements during the shard allocation (please note that no indexing or query load was run while shard allocation was in progress) ?
Would reducing indices.recovery.max_concurrent_file_chunks - from default value of 2 to 1, would reduce the demand on heap ?

NOTE: I understand that indices.recovery.max_concurrent_file_chunks reduction can slow-down recovery. But I wanted to be clear if concurrent recoveries of more than one shard is what is putting demand on the heap causing CircuitBreakingException.

Thanks in advance for any suggestions and advises

Dinesh

HenningAndersen · March 4, 2020, 11:37am

Hi @dinesh_gnanasamy,

do you use G1 GC or CMS? Also, the full exception line should contain a bit more specific information, it would be nice to see the full line and stacktrace.

dinesh_gnanasamy · March 5, 2020, 7:01am

Hi @HenningAndersen ,

Thanks for the reply

I use CMS only. Since I am running JDK 8, I thought I would stay from G1GC as I was not too sure if switching to G1GC would be of help

As an attempt to see if the situation mitigates, I had increased heap to 22gb and overall memory allocation for ES container to 32gb in 12 node cluster which I referred in previous message. With that full stack trace is as follows :

[2020-03-03T05:05:57,781][WARN ][o.e.c.r.a.AllocationService] [elasticsearch-10] failing shard [failed shard, shard [tsg_ngf_5da7856949bf2f00c69b07fe_assuranceinterfacemetrics_vertex_24_18319][4], node[35VghebSSsasUjGCN2Zs9A], [R], s[STARTED], a[id=9C_uI8m2RzK2OX66ur8taw], message [failed to perform indices:data/write/bulk[s] on replica [tsg_ngf_5da7856949bf2f00c69b07fe_assuranceinterfacemetrics_vertex_24_18319][4], node[35VghebSSsasUjGCN2Zs9A], [R], s[STARTED], a[id=9C_uI8m2RzK2OX66ur8taw]], failure [RemoteTransportException[[elasticsearch-2][10.60.0.213:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [22289099296/20.7gb], which is larger than the limit of [21848994611/20.3gb], real usage: [22289086648/20.7gb], new bytes reserved: [12648/12.3kb]]; ], markAsStale [true]]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-1][10.60.0.213:9300][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [22289099296/20.7gb], which is larger than the limit of [21848994611/20.3gb], real usage: [22289086648/20.7gb], new bytes reserved: [12648/12.3kb]
	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.0.1.jar:7.0.1]
	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.0.1.jar:7.0.1]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1026) ~[elasticsearch-7.0.1.jar:7.0.1]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:922) ~[elasticsearch-7.0.1.jar:7.0.1]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:753) ~[elasticsearch-7.0.1.jar:7.0.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]

Not sure if one observation which I made is of use - In above stacktrace, the source node is elasticsearch-10 which also is the master node of the cluster

system · April 2, 2020, 7:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.