I'm running into some odd behavior with my Elasticsearch (v7.2) clusters where they're hitting the OOM circuit breaker exception under relatively light load, considering their physical memory allocation is 31GB.
The error I receive after a node is up for about a day is:
[2019-08-25T13:47:56,529][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [my-node-1] failed to execute on node [i9gGZceXTSKRgZvjEcFh9g] org.elasticsearch.transport.RemoteTransportException: [my-node-2][10.84.207.184:9500][cluster:monitor/nodes/stats[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [33245621418/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33245618176/30.9gb], new bytes reserved: [3242/3.1kb]
I initially thought the issue was due to G1GC, however we changed the runtime command to utilize CMS and the cluster became nearly-unresponsive. I switched back to G1GC, and utilized some commands to gain insight into OOPs:
-XX:+UnlockDiagnosticVMOptions, -XX:+PrintCompressedOopsMode
However, the only info I receive in the logs regarding pointers is that Compressed OOPs are set to true
:
[2019-08-27T01:01:11,064][INFO ][o.e.e.NodeEnvironment ] [my-node-1] heap size [31gb], compressed ordinary object pointers [true]
I'm curious what other methods I can use to view if I am running zero-based compressed OOPs, otherwise perhaps I'm running conflicting commands at startup? My startup commands can be found here:
[2019-08-27T01:01:11,319][INFO ][o.e.n.Node ] [my-node-1] JVM arguments [-XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -X
X:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=t
rue, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -Des.allow_insecure_settings=true, -XX:+HeapDumpOnOutOfMemoryError, -Dmapper.allow_dots_in_name=true, -Xm
s33285996544, -Xmx33285996544, -XX:+UseG1GC, -XX:+UnlockDiagnosticVMOptions, -XX:+PrintCompressedOopsMode, -Dio.netty.allocator.type=pooled, -XX:MaxDirectMemorySize=16642998272, -Des.path.home=/us
r/share/elasticsearch-all/elasticsearch-7.2.0, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=tar, -Des.bundled_jdk=true]
Is there any issue running CMS-specific commands (e.g. +UseCMSInitiatingOccupancyOnly
, while also implementing G1GC (e.g. -XX:+UseG1GC
)?