Elasticsearch appears to ignore -XX:+UseG1GC in jvm.options


(Michael Goldsby) #1

Hi,

We're running Elasticsearch 5.4.1 on OpenJDK 1.8.0_141. We recently tried switching from CMS to G1GC by replacing the following lines in jvm.options:

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

with:

-XX:+UseG1GC

However, upon starting ES, the setting is simply ignored, and it reverts to the default Parallel GC (which is horrible). The flag doesn't even appear in the command line flags:

elastic+ 18358 273 59.7 7446480 4580292 ? SLl 00:29 0:13 /usr/bin/java -Xms4g -Xmx4g -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Epath.conf=/etc/elasticsearch-0

Compared to when using CMS:

elastic+ 19758 136 57.3 6820280 4392720 ? SLl 00:30 0:02 /usr/bin/java -Xms4g -Xmx4g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -Epath.conf=/etc/elasticsearch-0

I know there is a bootstrap check to ensure we're not running JDK < 1.8u40, but this looks more like the option is being silently ignored. Has anybody else encountered this?

Mike


(Mark Walkom) #2

We currently don't support G1GC as there are known issues with Lucene.

I don't know if we actively ignore users trying to set it though so I can't explain this. I will see what I can find out though.


(Jason Tedor) #3

We are not stripping it, and we are not ignoring it. Are you sure that you have -XX:+UseG1GC at the start of a line (no leading whitespace), and also not commented out? If so, can you share the full contents of your jvm.options in a gist somewhere so that I can try to reproduce? Alternatively, can you start Elasticsearch with bash -x bin/elasticsearch and share the output here so that I can see what's happening?

To note, as @warkolm mentions, we do not support G1GC. However, the primary reason these days is that performance with G1GC is completely unacceptable, G1GC steals too many application CPU cycles.


(Michael Goldsby) #4

Jason,

Thanks for replying. Here's our complete jvm.options: https://gist.github.com/golfdish/c051cbe9e1196a7acbbbe9559666a4dc

And the output of running Elasticsearch with -x:

$ sudo -u elastic bash -x /usr/share/elasticsearch/bin/elasticsearch
+ echo distribution
+ grep project.name
+ test -n ''
+ test -n ''
+ test -n ''
+ test -n ''
+ test -n ''
+ test -n ''
+ test -n ''
+ test -n ''
+ CDPATH=
+ SCRIPT=/usr/share/elasticsearch/bin/elasticsearch
+ '[' -h /usr/share/elasticsearch/bin/elasticsearch ']'
++ dirname /usr/share/elasticsearch/bin/elasticsearch
+ ES_HOME=/usr/share/elasticsearch/bin/..
++ cd /usr/share/elasticsearch/bin/..
++ pwd
+ ES_HOME=/usr/share/elasticsearch
+ '[' -z '' ']'
+ for jvm_options in '"$ES_HOME"/config/jvm.options' /etc/elasticsearch/jvm.options
+ '[' -r /usr/share/elasticsearch/config/jvm.options ']'
+ for jvm_options in '"$ES_HOME"/config/jvm.options' /etc/elasticsearch/jvm.options
+ '[' -r /etc/elasticsearch/jvm.options ']'
++ parse_jvm_options ''
++ '[' -f '' ']'
+ ES_JAVA_OPTS=' '
+ '[' x = x ']'
++ dirname /usr/share/elasticsearch/bin/elasticsearch
+ for include in /usr/share/elasticsearch/elasticsearch.in.sh /usr/local/share/elasticsearch/elasticsearch.in.sh /opt/elasticsearch/elasticsearch.in.sh '~/.elasticsearch.in.sh' '"$ES_HOME/bin/elasticsearch.in.sh"' '"`dirname "$0"`"/elasticsearch.in.sh'
+ '[' -r /usr/share/elasticsearch/elasticsearch.in.sh ']'
+ for include in /usr/share/elasticsearch/elasticsearch.in.sh /usr/local/share/elasticsearch/elasticsearch.in.sh /opt/elasticsearch/elasticsearch.in.sh '~/.elasticsearch.in.sh' '"$ES_HOME/bin/elasticsearch.in.sh"' '"`dirname "$0"`"/elasticsearch.in.sh'
+ '[' -r /usr/local/share/elasticsearch/elasticsearch.in.sh ']'
+ for include in /usr/share/elasticsearch/elasticsearch.in.sh /usr/local/share/elasticsearch/elasticsearch.in.sh /opt/elasticsearch/elasticsearch.in.sh '~/.elasticsearch.in.sh' '"$ES_HOME/bin/elasticsearch.in.sh"' '"`dirname "$0"`"/elasticsearch.in.sh'
+ '[' -r /opt/elasticsearch/elasticsearch.in.sh ']'
+ for include in /usr/share/elasticsearch/elasticsearch.in.sh /usr/local/share/elasticsearch/elasticsearch.in.sh /opt/elasticsearch/elasticsearch.in.sh '~/.elasticsearch.in.sh' '"$ES_HOME/bin/elasticsearch.in.sh"' '"`dirname "$0"`"/elasticsearch.in.sh'
+ '[' -r /home/mgoldsby/.elasticsearch.in.sh ']'
+ for include in /usr/share/elasticsearch/elasticsearch.in.sh /usr/local/share/elasticsearch/elasticsearch.in.sh /opt/elasticsearch/elasticsearch.in.sh '~/.elasticsearch.in.sh' '"$ES_HOME/bin/elasticsearch.in.sh"' '"`dirname "$0"`"/elasticsearch.in.sh'
+ '[' -r /usr/share/elasticsearch/bin/elasticsearch.in.sh ']'
+ . /usr/share/elasticsearch/bin/elasticsearch.in.sh
++ '[' x '!=' x ']'
++ ES_CLASSPATH='/usr/share/elasticsearch/lib/*'
+ break
+ '[' -x /bin/java ']'
++ which java
+ JAVA=/usr/bin/java
+ '[' '!' -x /usr/bin/java ']'
+ '[' -z '/usr/share/elasticsearch/lib/*' ']'
+ '[' x '!=' x ']'
+ '[' x '!=' x ']'
+ /usr/bin/java -cp '/usr/share/elasticsearch/lib/*' org.elasticsearch.tools.JavaVersionChecker
+ '[' 0 -ne 0 ']'
++ hostname
++ cut -d. -f1
+ HOSTNAME=elasticlogsworker-staging-iad-06285d1eeddf3992d
+ export HOSTNAME
++ echo
++ egrep -- '(^-d |-d$| -d |--daemonize$|--daemonize )'
+ daemonized=
+ '[' -z '' ']'
+ exec /usr/bin/java -Des.path.home=/usr/share/elasticsearch -cp '/usr/share/elasticsearch/lib/*' org.elasticsearch.bootstrap.Elasticsearch
2017-08-14 17:13:44,199 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
	at java.lang.SecurityManager.checkPermission(SecurityManager.java:585)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
	at org.apache.logging.log4j.core.jmx.Server.register(Server.java:389)
	at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:167)
	at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
	at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:556)
	at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:261)
	at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:206)
	at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:221)
	at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:197)
	at org.elasticsearch.common.logging.LogConfigurator.configureStatusLogger(LogConfigurator.java:175)
	at org.elasticsearch.common.logging.LogConfigurator.configureWithoutConfig(LogConfigurator.java:99)
	at org.elasticsearch.cli.Command.main(Command.java:85)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)

Exception in thread "main" java.lang.IllegalStateException: status logger logged an error before logging was configured
	at org.elasticsearch.common.logging.LogConfigurator.checkErrorListener(LogConfigurator.java:128)
	at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:117)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:316)
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123)
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114)
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
	at org.elasticsearch.cli.Command.main(Command.java:88)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)
Refer to the log for complete error details.

(Jason Tedor) #5

The debug output indicates that /usr/share/elasticsearch/config/jvm.options and /etc/elasticsearch/jvm.options are not readable by the elastic user (either the file does not exist, or permissions are borked). Can you please check the location you expect the jvm.options to be in (it should be /etc/elasticsearch) and check ownership and permissions?


(Michael Goldsby) #6

That's odd, since it definitely picks up the CMS flags I referenced in my first post.

In any case, here are the permissions of /etc/elasticsearch-0 (we run elasticsearch with -Epath.conf=/etc/elasticsearch-0, to support multiple nodes per host):

$ sudo ls -la /etc/elasticsearch-0/
total 36
drwxr-x---   6 root elasticsearch  4096 Aug 14 18:29 .
drwxr-xr-x 129 root root          12288 Aug  8 17:25 ..
drwxr-x---   2 root elasticsearch  4096 May 20 19:05 discovery-ec2
-rw-rw----   1 root elasticsearch  1424 Aug  7 23:06 elasticsearch.yml
-rw-r--r--   1 root elasticsearch  3137 Aug 14 17:12 jvm.options
-rw-r--r--   1 root elasticsearch  3788 May 16 22:16 log4j2.properties
drwxr-x---   2 root elasticsearch  4096 Jun 14 23:19 repository-s3
drwxr-x---   2 root elasticsearch  4096 Apr 28 17:45 scripts

Note that ES appears hardcoded to read jvm.options from /etc/elasticsearch even when -Epath.conf is set, so we have the files duplicated there:

$ sudo ls -la /etc/elasticsearch
total 36
drwxr-x---   5 root elasticsearch  4096 Aug 14 18:28 .
drwxr-xr-x 129 root root          12288 Aug  8 17:25 ..
drwxr-x---   2 root elasticsearch  4096 May 20 19:05 discovery-ec2
-rw-r--r--   1 root elasticsearch  3237 Aug 14 18:17 jvm.options
-rw-r--r--   1 root elasticsearch  3788 May 16 22:16 log4j2.properties
drwxr-x---   2 root elasticsearch  4096 Jun 14 23:19 repository-s3
drwxr-x---   2 root elasticsearch  4096 Apr 28 17:45 scripts

When I run sudo -u elasticsearch bash -x /usr/share/elasticsearch/bin/elasticsearch -Epath.conf=/etc/elasticsearch-0, the GC report at the end reads:

Heap
 PSYoungGen      total 1223168K, used 188744K [0x000000076ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 1048576K, 18% used [0x000000076ab00000,0x00000007763521e8,0x00000007aab00000)
  from space 174592K, 0% used [0x00000007b5580000,0x00000007b5580000,0x00000007c0000000)
  to   space 174592K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007b5580000)
 ParOldGen       total 2796544K, used 0K [0x00000006c0000000, 0x000000076ab00000, 0x000000076ab00000)
  object space 2796544K, 0% used [0x00000006c0000000,0x00000006c0000000,0x000000076ab00000)
 Metaspace       used 14238K, capacity 14598K, committed 14976K, reserved 1062912K
  class space    used 1708K, capacity 1826K, committed 1920K, reserved 1048576K

Note that Parallel GC is still being used, despite the -XX:+UseG1GC flag. But if I set the CMS flags I pasted in the first post in /etc/elasticsearch/jvm.options, the GC output is:

Heap
 par new generation   total 306688K, used 103644K [0x00000006c0000000, 0x00000006d4cc0000, 0x00000006d4cc0000)
  eden space 272640K,  38% used [0x00000006c0000000, 0x00000006c6537320, 0x00000006d0a40000)
  from space 34048K,   0% used [0x00000006d0a40000, 0x00000006d0a40000, 0x00000006d2b80000)
  to   space 34048K,   0% used [0x00000006d2b80000, 0x00000006d2b80000, 0x00000006d4cc0000)
 concurrent mark-sweep generation total 3853568K, used 0K [0x00000006d4cc0000, 0x00000007c0000000, 0x00000007c0000000)
 Metaspace       used 14257K, capacity 14598K, committed 14720K, reserved 1062912K
  class space    used 1708K, capacity 1826K, committed 1920K, reserved 1048576K

So it's obviously reading those flags from that file.


(Jason Tedor) #7

In your first bash -x output, you used the elastic user and now you seem to be using elasticsearch. Which user is correct, and can the elastic user read those files? The first output that you shared from bash -x is definitvely not able to find any jvm.options that could be read. Can you please share output from bash -x again with the elasticsearch user?


(Michael Goldsby) #8

Running as elastic was my error, the proper user is elasticsearch.

Full output of sudo -u elasticsearch bash -x /usr/share/elasticsearch/bin/elasticsearch -Epath.conf=/etc/elasticsearch-0: https://gist.github.com/golfdish/bceee34da2638cccc8c7c3a84f146ecb

Startup continues normally afterward.


(Jason Tedor) #9

This line indicates that /etc/elasticsearch/jvm.options is being parsed:

+++ grep '^-' /etc/elasticsearch/jvm.options

This line indicates that -XX:+UseG1GC is not in that file:

++ echo '-Xms4g -Xmx4g -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps '

Are you sure that you're placing the -XX:+UseG1GC in the right file? Are you sure that it's in /etc/elasticsearch/jvm.options?

Note that earlier you shared a gist that showed the heap size being set to 30g yet the output here shows 4g as the heap size. Are we really looking at the right file?

Or do you maybe have an automated process that is replacing /etc/elastcisearch/jvm.options, for example Puppet or some other process that is getting in the way here?


(Michael Goldsby) #10

The 30g heap was from a production host, before I started testing on our staging environment. Apart from heap sizes the files are identical, and I can verify that -XX:+UseG1GC does appear in /etc/elasticsearch/jvm.options on the host where I executed the above commands.

Outside of my own editing, the files are only (potentially) changed at deploy time, but we are not running any deploys concurrent with my testing. The only changes to the files are mine, manually.


(Jason Tedor) #11

You have the wrong character prefixing XX:G1GC:

$ curl -sS -XGET https://gist.githubusercontent.com/golfdish/c051cbe9e1196a7acbbbe9559666a4dc/raw/f47d3109a144ee2f58f842b17e5842c77859fc63/jvm.options | grep --color=never UseG1GC | xxd
00000000: e280 9358 583a 2b55 7365 4731 4743 0a    ...XX:+UseG1GC.

That indicates the character there has bytes 0xe28093 before the XX. It should be 0x2d for the - character, for example:

$ curl -sS -XGET https://gist.githubusercontent.com/golfdish/c051cbe9e1196a7acbbbe9559666a4dc/raw/f47d3109a144ee2f58f842b17e5842c77859fc63/jvm.options | grep --color=never Xss | xxd
00000000: 2d58 7373 316d 0a                        -Xss1m.

(Michael Goldsby) #12

Good grief, that was a unicode en dash all along.

It works now, naturally. Moral: be careful where you copy/paste from.

Thank you and your sharp eyes!


(Jason Tedor) #13

You're welcome.

It was a hunch that occurred to me while running errands this afternoon, I really could not wait to get home and verify it. :smile:


(Michael Goldsby) #14

Incidentally, going from CMS to G1GC was night and day. We're operating a large, high-volume logging cluster, and on CMS we had nodes dying in GC spirals on a daily basis. On G1GC our cluster has been infinitely more stable.


(Zoran Simic) #15

Agreed, I also switched to G1GC for a large logstash installation and find it to be performing much better than CMS.

I am curious to see why/when G1GC wouldn't work (that's why I read this thread), but so far it seems to be doing rather better than CMS.


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.