Verifying Zero-Based Compressed OOPs & reviewing my JVM configs

I'm running into some odd behavior with my Elasticsearch (v7.2) clusters where they're hitting the OOM circuit breaker exception under relatively light load, considering their physical memory allocation is 31GB.

The error I receive after a node is up for about a day is:

[2019-08-25T13:47:56,529][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [my-node-1] failed to execute on node [i9gGZceXTSKRgZvjEcFh9g] org.elasticsearch.transport.RemoteTransportException: [my-node-2][10.84.207.184:9500][cluster:monitor/nodes/stats[n]] 
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [33245621418/30.9gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33245618176/30.9gb], new bytes reserved: [3242/3.1kb]

I initially thought the issue was due to G1GC, however we changed the runtime command to utilize CMS and the cluster became nearly-unresponsive. I switched back to G1GC, and utilized some commands to gain insight into OOPs:
-XX:+UnlockDiagnosticVMOptions, -XX:+PrintCompressedOopsMode

However, the only info I receive in the logs regarding pointers is that Compressed OOPs are set to true:

[2019-08-27T01:01:11,064][INFO ][o.e.e.NodeEnvironment    ] [my-node-1] heap size [31gb], compressed ordinary object pointers [true]

I'm curious what other methods I can use to view if I am running zero-based compressed OOPs, otherwise perhaps I'm running conflicting commands at startup? My startup commands can be found here:

[2019-08-27T01:01:11,319][INFO ][o.e.n.Node               ] [my-node-1] JVM arguments [-XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -X
X:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=t
rue, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -Des.allow_insecure_settings=true, -XX:+HeapDumpOnOutOfMemoryError, -Dmapper.allow_dots_in_name=true, -Xm
s33285996544, -Xmx33285996544, -XX:+UseG1GC, -XX:+UnlockDiagnosticVMOptions, -XX:+PrintCompressedOopsMode, -Dio.netty.allocator.type=pooled, -XX:MaxDirectMemorySize=16642998272, -Des.path.home=/us
r/share/elasticsearch-all/elasticsearch-7.2.0, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=tar, -Des.bundled_jdk=true]

Is there any issue running CMS-specific commands (e.g. +UseCMSInitiatingOccupancyOnly, while also implementing G1GC (e.g. -XX:+UseG1GC)?

Hi @seth.yes,

it sounds like your heap is nearly full and that G1 copes with this slightly more gracefully than CMS. Can you share the GC logs files here? Also for completeness the java version.

Zero-based compressed oops are likely not available with 31GB. My machine does not like going beyond 30GB, but this varies between platforms. You can do some preliminary checking of this by just running java with the heap size, something like:

java -XX:+Use1G1GC -Xmx30g -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode X

will give an indication though the exact options passed to java might affect this.

Whether zero based compressed oops are used is likely unrelated to the issue described above.

The CMS options should be simply ignored when using G1.

Thanks for your response @HenningAndersen. Yes we are using the bundled OpenJDK 12 that comes with ES 7.2.0. I read in another thread that some users are experiencing issues with ES 7.2 while using OpenJDK 12 and G1GC.

It looks like zero-based compressed OOPs are enabled:

$ JAVA_HOME=$ES_HOME/jdk java -Xmx33285996544 -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops
     bool UseCompressedOops                        := true                                {lp64_product}

I'll enable gc logging and get back to you shortly here.

1 Like

The issue ended up being an issue with using OpenJDK 12 on Docker:

Apparently unless you explicitly set the number of cores in your JVM options, Java ignores the number of cores available in Docker and sets it to 1.

We got around this by explicitly setting the number of processors in the ES Startup:

-XX:ActiveProcessorCount=$num_cpu_core

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.