Elasticsearch Service won't start after full cluster upgrade from 5.4 to 6.0

Hi,
I was running elasticsearch cluster with 3 nodes version 5.4 on Centos. And I decided to upgrade it to version 6. I did full cluster restart upgrade followed all the https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-upgrade.html#restart-upgrade in this link. After installing elasticsearch 6.0 elasticsearch services won't start. Can anyone help? I have lot of important data on this cluster and I didn't take snapshot of them. Is there anyway to downgrade to previous version or recover this data ?

After trying to restart the elasticsearch service this was logged in the systemctl status. There was no log in the /var/log/elasticsearch/elk-cluster.log
[anar@elknode2 elasticsearch]$ sudo systemctl status elasticsearch -l
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-11-17 10:04:50 +08; 9s ago
Docs: http://www.elastic.co
Process: 3068 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=203/EXEC)

Nov 17 10:04:50 elknode2 systemd[1]: Starting Elasticsearch...
Nov 17 10:04:50 elknode2 systemd[3068]: Failed at step EXEC spawning /usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec: No such file or directory
Nov 17 10:04:50 elknode2 systemd[1]: elasticsearch.service: control process exited, code=exited status=203
Nov 17 10:04:50 elknode2 systemd[1]: Failed to start Elasticsearch.
Nov 17 10:04:50 elknode2 systemd[1]: Unit elasticsearch.service entered failed state.
Nov 17 10:04:50 elknode2 systemd[1]: elasticsearch.service failed.

ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=203/EXEC)

I checked /usr/share/elasticsearch/bin/ folder and there was no elasticsearch-systemd-pre-exec file. And I tried commenting out this line from the elasticsearch.service file. Then It gave me another error below.

● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-11-17 10:11:57 +08; 2s ago
Docs: http://www.elastic.co
Process: 26833 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=1/FAILURE)
Main PID: 26833 (code=exited, status=1/FAILURE)

Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,607 main ERROR Null object returned for RollingFile in Appenders.
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,608 main ERROR Null object returned for RollingFile in Appenders.
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,608 main ERROR Unable to locate appender "rolling" for logger config "root"
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,608 main ERROR Unable to locate appender "index_indexing_slowlog_rolling" for logger config "index.indexing.slowlog.index"
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,609 main ERROR Unable to locate appender "audit_rolling" for logger config "org.elasticsearch.xpack.security.audit.logfile.LoggingAuditTrail"
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,609 main ERROR Unable to locate appender "index_search_slowlog_rolling" for logger config "index.search.slowlog"
Nov 17 10:11:56 ELKNODE1 elasticsearch[26833]: 2017-11-17 10:11:56,610 main ERROR Unable to locate appender "deprecation_rolling" for logger config "org.elasticsearch.deprecation"
Nov 17 10:11:57 ELKNODE1 systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Nov 17 10:11:57 ELKNODE1 systemd[1]: Unit elasticsearch.service entered failed state.
Nov 17 10:11:57 ELKNODE1 systemd[1]: elasticsearch.service failed.

Thanks in Advance

What's in /var/log/elasticsearch/?

There was no log written in /var/log/elasticsearch

Just this
[2017-11-16T15:29:15,264][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
[2017-11-16T15:29:17,862][INFO ][o.e.n.Node ] [elk-node-1] stopped
[2017-11-16T15:29:17,862][INFO ][o.e.n.Node ] [elk-node-1] closing ...
[2017-11-16T15:29:19,388][INFO ][o.e.n.Node ] [elk-node-1] closed

the file /usr/lib/systemd/system/elasticsearch.service seem not upgraded successfully,
still using old config, and the command ExecStart should be :
ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet

3 Likes

This solved my problem thank you very much @medcl.net .
[Unit]
Description=Elasticsearch
Documentation=http://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Environment=ES_HOME=/usr/share/elasticsearch
Environment=CONF_DIR=/etc/elasticsearch
Environment=DATA_DIR=/var/lib/elasticsearch
Environment=LOG_DIR=/var/log/elasticsearch
Environment=PID_DIR=/var/run/elasticsearch
EnvironmentFile=-/etc/sysconfig/elasticsearch

WorkingDirectory=/usr/share/elasticsearch

User=elasticsearch
Group=elasticsearch

#ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec
ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet
#ExecStart=/usr/share/elasticsearch/bin/elasticsearch \

-p ${PID_DIR}/elasticsearch.pid \

--quiet \

-Edefault.path.logs=${LOG_DIR} \

-Edefault.path.data=${DATA_DIR} \

-Edefault.path.conf=${CONF_DIR}

1 Like

Please make sure you upgrade the service files when you upgrade Elasticsearch.

You should not be changing these!!!! We may be making changes to them, and any changes you are making are better done in other config files, to stop issues like these happening.

Had a similar experience when upgrading my dev cluster (thankfully figured this out before rolling out prod). As previously identified above the old service code starts with default paths for logs, data and conf. However the new service code omits this...

To resolve this in my instance i just removed the reiliance on the now missing defaults by adding the paths to my config.

e.g. [/etc/elasticsearch/elasticsearch.yml] config before

cluster.name: MyCluster
network.host: 0.0.0.0
http.port: 9200
http.cors.enabled: true
http.cors.allow-origin: "*"

Config after

cluster.name: MyCluster
network.host: 0.0.0.0
http.port: 9200
http.cors.enabled: true
http.cors.allow-origin: "*"
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

This seems to resolve the issue for me without the need to change /usr/lib/systemd/system/elasticsearch.service

2 Likes

If using an RPM based system, make sure it's using the new config file

mv /usr/lib/systemd/system/elasticsearch.service /usr/lib/systemd/system/elasticsearch.service.old
mv /usr/lib/systemd/system/elasticsearch.service.rpmnew /usr/lib/systemd/system/elasticsearch.service

1 Like

I was upgraded ES from v5.6.2 to v6 and /usr/lib/systemd/system/elasticsearch.service seemed to be correct but my Elasticsearch had the same problem.

@TheFoolsErrand answer makes my ES workes.
thank you.

It seems that multiple things need to be taken care of, depending on your settings.
I upgraded from a working 5.6.1 to 6.0.1. In doing so I had to do all of the following:

Added to /etc/elasticsearch/elasticsearch.yml as mentioned by @TheFoolsErrand :
data.path: /var/lib/elasticsearch
data.logs: /varlog/elasticsearch

Use the new service file as mentioned by @dlorent :
mv /usr/lib/systemd/system/elasticsearch.service /usr/lib/systemd/system/elasticsearch.service.old
mv /usr/lib/systemd/system/elasticsearch.service.rpmnew /usr/lib/systemd/system/elasticsearch.service

Maybe it goes without saying, but if you are now using a new service file you would need to edit the new service file with any modifications you had in the original service file.
In my case I have bootstrap.memory_lock: truein my elasticsearch.yml thefore I needed to add LimitMEMLOCK=infinity to the new elasticsearch.service file.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.