I have a problem with starting one of my nodes.
The ES runs on a cluster of 3 nodes (version 6.6).
The node2 dosen't want to start up.
When I start it from systemctl it seems to be running:
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-02-14 14:41:15 CET; 4s ago
Docs: http://www.elastic.co
Main PID: 20149 (java)
CGroup: /system.slice/elasticsearch.service
└─20149 /bin/java -Xms4g -Xmx4g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless...
Feb 14 14:41:15 aspnaplod2 systemd[1]: Started Elasticsearch.
But in few seconds it fails - without writing any character of logs.
Feb 14 14:41:15 aspnaplod2 systemd[1]: Started Elasticsearch.
Feb 14 14:41:21 aspnaplod2 systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Feb 14 14:41:21 aspnaplod2 systemd[1]: Unit elasticsearch.service entered failed state.
Feb 14 14:41:21 aspnaplod2 systemd[1]: elasticsearch.service failed.
Hi,
Yes, the other nodes starts correctly. Health state is yellow now (unable to allocate some replica shards).
This cluster is implemented in last November. The new addition in the elasticsearch.yml is the auditing extra added (xpack.security.audit.logfile.events.emit_request_body: true), although this addition is done on the other nodes also. On the node-2 there are some other components running without any problem (filebeat, metricbeat, logstash).
The logfile location is set in the elasticsearch.yml (to a folder on different disk) - it is done at the beginning.
Certainly sounds a node specific issue then rather than the cluster as a whole. If you have made changes to the elasticsearch.yml file I'd start by rolling back to the old version that worked if you still have it and re-applying the change incase an error was introduced. if not, load up one of the known good ones side by side with your one that isn't working. Obviously there will be some differences for things like hostname, ip etc. but check everything to make sure you don't have a formating issue in there as it could be one entry is set wrong and throwing out everything else. Alternatly you can take the file from a good node and change the node specific variables for the one that is faulting. Keep a backup of the old file for reference if you do this.
It's been a while since I had to do it so I can't remember the process but you could look up how to lauch it in a session rather than as a service so you see the errors on screen.
It's drastic but if you really can't find your needle in a haystack as to why this isn't working there is always the option to rebuild the node, re-introduce it to the cluster and let it all ballance out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.