As a test, I followed the exact same procedure on another server (that should be configured identically) and it worked perfectly.
Relevant information from journalctr
Any other pointers please ?
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.systemd.Libsystemd.lambda$static$0(Libsystemd.java:34)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.systemd.Libsystemd.<clinit>(Libsystemd.java:33)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.systemd.SystemdPlugin.sd_notify(SystemdPlugin.java:126)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.systemd.SystemdPlugin.onNodeStarted(SystemdPlugin.java:137)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.node.Node.start(Node.java:823)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:317)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:402)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.cli.Command.main(Command.java:90)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
Aug 04 16:45:09 server.domain.co.uk systemd-entrypoint[17950]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
Aug 04 16:45:10 server.domain.co.uk systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Aug 04 16:45:10 server.domain.co.uk systemd[1]: Unit elasticsearch.service entered failed state.
Aug 04 16:45:10 server.domain.co.uk systemd[1]: elasticsearch.service failed
No idea; need to find more logs of some type or find way to start manually, etc. as must be some core issue wrong with JVM or disk/network, etc. Kinda strange, though usually good at showing errors.
Not really, all I can think of are the base elasticsearch logs - I guess to figure out if it's really starting at all, or some JVM issues prevent that - something has to be different, maybe Java version or Java on one VM but not other (ES V7 has its own JVM but maybe it's interfered with or HOME set wrong, etc.)
Need to find the STDERR logs or output, see below.
But looking at your logs, it's failing in bootstrap which checks some things like host, IP, config, shared RAM, etc. so must be failing that, just no logs.
Looking at the code for the lines it mentions, in latest version, it can't seem to create / setup the process, or can't daemonize:
But if that fails, should show an error message from this exception which it must be hitting and then dying;
throw new UserException(ExitCodes.CONFIG, e.getMessage());
The real init code is here and complex, but stderr should show fatal startup errors, and can fail for many reasons so need to find that stderr - even if you need to start manually on command line; should be easy.
The installed versions of Java are exactly the same across both servers so doesnt look like that.
Can you please explain the procedure ?
The only thing I have found from any of the standard logs is the following extract from elasticsearch.log
[2020-08-04T16:45:09,690][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [server.domain.co.uk] fatal error in thread [main], exiting
java.lang.NoClassDefFoundError: Could not initialize class com.sun.jna.Native
at org.elasticsearch.systemd.Libsystemd.lambda$static$0(Libsystemd.java:34) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]
at org.elasticsearch.systemd.Libsystemd.<clinit>(Libsystemd.java:33) ~[?:?]
at org.elasticsearch.systemd.SystemdPlugin.sd_notify(SystemdPlugin.java:126) ~[?:?]
at org.elasticsearch.systemd.SystemdPlugin.onNodeStarted(SystemdPlugin.java:137) ~[?:?]
at java.util.ArrayList.forEach(ArrayList.java:1510) ~[?:?]
at org.elasticsearch.node.Node.start(Node.java:823) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:317) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:402) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.8.1.jar:7.8.1]
I suppose that your /tmp is noexec mode. Check it like this:
cat /etc/fstab |grep tmp
results on my michine is:
/dev/mapper/rhel-tmp /tmp xfs defaults,nodev,noexec,nosuid 0 0
If there is noexec then you should create some other tmp directory for elasticsearch service. For exmaple:
mkdir -p /home/elasticsearch/tmp
chown -R elasticsearch.elasticsearch /home/elasticsearch/
and change configuration:
vi /etc/elasticsearch/jvm.options
## JVM temporary directory
# -Djava.io.tmpdir=${ES_TMPDIR}
-Djava.io.tmpdir=/home/elasticsearch/tmp
Wow, would NEVER have guessed that in a zillion years, and we used to set NOEXEC on /tmp as routine best practice - wonder how it got set on one of his servers but not another.
Really need better logging or at least some kind of bootstrap check for this; insidious.
BUT his defaults,noauto should not include NOEXEC. Man page says: rw, suid, dev, exec, auto, nouser, and async
So I wonder if there is another issue that just using a different directory fixed.
Also I thought CentOS 7 was already ext4; no, it's xfs - so wondering how this /tmp was created as ext3 - this suggests an unusual setup / config process.
We discovered this problem during installation ELK on RHEL7.8 with xfs. During installation we had the same problem with services elasticsearch and logstash.
You are right that there should be more information in the log.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.