I have 25 Elasticsearch servers in 6 different clusters all running version 2.3.1-1 as a service on OEL7 . I am getting random issues where Elasticsearch won't start upon reboot because the /run/elasticsearch directory is blown away and the elasticsearch user does not have the ability to re-create it. I then have to create the directory as root, change it to be owned by elasticsearch and start elasticsearch manually. Is this a bug? I think it is odd that it does not happen every time.
Do you mean
/var/run/elasticsearch? I don't believe we put anything in
Are you using the RPM package from ES?
Also a note, we don't officially support OEL, although it'll likely work fine since it's based on RHEL. Just wanted to leave a note about that in general, in case OEL is doing something funky.
Thank you for your response. I say /run/elasticsearch just because /var/run is a link to it but yes same thing and it is the Redhat kernel we are using so OEL should not be an issue. I am using the RPM package from Elasticsearch.
I'm honestly not sure, I polled some folks internally and we haven't seen that behavior before (on OEL, RHEL or otherwise). Something funky is going on for sure.
Do you notice it happening on the same hosts repeatedly, or does it "rotate" through the nodes so that different nodes are affected after each restart? Any chance there's a cron job or similar that's doing accidental "cleanup"?
This may work not sure.
if you create /var/run/elasticsearch and enter it in fstab so it automounts? This would mount the path at reboot before application services come up.
I am not sure this would work but don't see why it wouldn't either.
Sorry it took me while to get back to this. It does seem to rotate through the nodes and is inconsistent (elastisearch may start fine one time then upon reboot not start, upon another reboot not start, then another reboot suddenly start). There is not a cron job performing clean up but I am not sure I understand what you mean by this? The directory would be cleared out upon reboot and there is even a comment that states the in the startup script - # Ensure that the PID_DIR exists (it is cleaned at OS startup time)
I am using the rpm package I downloaded from Elasticsearch. I see other people who have had this issue and changed the pid file to go to /var/run instead of /var/run/elasticsearch. I tried that in /etc/init.d/elasticsearch and /usr/lib/systemd/system/elasticsearch.service but the service failed to start so I had to revert that change.
Thank you Mark- if I have to go that way I will give it a try but am hoping we can figure out why this is happening without mucking around with the mounts.