ElasticSearch 1.7.3 crashed and doesn't restart - no logs


#1

System info:

  • Ubuntu 14.04 64 bit LTS
  • ElasticSearch 1.7.3 from Elastic repo

I've been using that install for quite a while with no problem. It suddenly crashed for an unknown reason. Unfortunately, there is no log about the crash and starting it using 'service elasticsearch start' doesn't work; it hangs for about 10 seconds then fails and no logs are generated in /var/log/elasticsearch about it (yes, there are logs but there are from before it crashed).

Here are the last few lines in the log file if that's any useful (indexing_slowlog and search_slowlog are both empty):

[2015-11-30 10:23:17,551][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.01.30] update_mapping [monit] (dynamic)
[2015-11-30 10:57:01,403][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [snort] (dynamic)
[2015-11-30 11:43:31,081][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [sophos] (dynamic)
[2015-11-30 12:26:45,955][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [fortinet] (dynamic)
[2015-11-30 12:28:10,743][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [fortinet] (dynamic)
[2015-11-30 13:36:47,798][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [auth] (dynamic)
[2015-11-30 13:37:04,885][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [auth] (dynamic)
[2015-11-30 14:34:35,672][INFO ][cluster.metadata         ] [Tomorrow Man] [logstash-2015.11.30] update_mapping [iptables-dropped] (dynamic)
[2015-11-30 14:51:38,409][WARN ][monitor.jvm              ] [Tomorrow Man] [gc][young][2427240][80285] duration [1s], collections [1]/[1.7s], total [1s]/[1.7h], memory [10.6gb]->[10.2gb]/[18.9gb], all_pools {[young] [396.7mb]->[1mb]/[399.4mb]}{[survivor] [24.4mb]->[23.2mb]/[49.8mb]}{[old] [10.2gb]->[10.2gb]/[18.5gb]}
[2015-11-30 14:57:44,552][WARN ][monitor.jvm              ] [Tomorrow Man] [gc][young][2427567][80301] duration [4.1s], collections [1]/[5.1s], total [4.1s]/[1.7h], memory [10.6gb]->[10.2gb]/[18.9gb], all_pools {[young] [398.5mb]->[9.5mb]/[399.4mb]}{[survivor] [22.2mb]->[22.9mb]/[49.8mb]}{[old] [10.2gb]->[10.2gb]/[18.5gb]}

I checked to see if /var/run/elasticsearch had the right privileges/permission and it has (owned by 'elasticsearch' user):

~# ls /var/run/ -al | grep elastic
drwxr-xr-x  2 elasticsearch elasticsearch   60 Nov  2 12:23 elasticsearch

I found http://sandlininc.com/?p=747 and added the log line

log_daemon_msg "sudo -u $ES_USER $DAEMON $DAEMON_OPTS"

right before the startup. Starting it with the sudo command displayed on the screen works fine (and displays no errors) but starting it through 'service' still fails:

sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch

Unfortunately, updating to 2.1.0 is not an option at this time since another tool we use depends on 1.7.3 to work.

As mentioned by someone else, there is more than enough disk space (more than 250Gb left). RAM isn't an issue either. the system has 24Gb and there is 19Gb given to ElasticSearch (ES_HEAP_SIZE=19g, in /etc/default/elasticsearch).

Is there anything else I can do or look for to debug this issue?


(Magnus Bäck) #2

I'd run the init script with -x so that it logs all commands being run:

sudo bash -x /etc/init.d/elasticsearch start

#3

I might be wrong but it doesn't look like it contains anything useful.

Here is the complete output (part 1/2):

+ PATH=/bin:/usr/bin:/sbin:/usr/sbin
+ NAME=elasticsearch
+ DESC='Elasticsearch Server'
+ DEFAULT=/etc/default/elasticsearch
++ id -u
+ '[' 0 -ne 0 ']'
+ . /lib/lsb/init-functions
+++ run-parts --lsbsysinit --list /lib/lsb/init-functions.d
++ for hook in '$(run-parts --lsbsysinit --list /lib/lsb/init-functions.d 2>/dev/null)'
++ '[' -r /lib/lsb/init-functions.d/20-left-info-blocks ']'
++ . /lib/lsb/init-functions.d/20-left-info-blocks
++ for hook in '$(run-parts --lsbsysinit --list /lib/lsb/init-functions.d 2>/dev/null)'
++ '[' -r /lib/lsb/init-functions.d/50-ubuntu-logging ']'
++ . /lib/lsb/init-functions.d/50-ubuntu-logging
+++ LOG_DAEMON_MSG=
++ FANCYTTY=
++ '[' -e /etc/lsb-base-logging.sh ']'
++ true
+ '[' -r /etc/default/rcS ']'
+ . /etc/default/rcS
++ UTC=yes
+ ES_USER=elasticsearch
+ ES_GROUP=elasticsearch
+ JDK_DIRS='/usr/lib/jvm/java-8-oracle/ /usr/lib/jvm/j2sdk1.8-oracle/ /usr/lib/jvm/jdk-7-oracle-x64 /usr/lib/jvm/java-7-oracle /usr/lib/jvm/j2sdk1.7-oracle/ /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-amd64/ /usr/lib/jvm/java-7-openjdk-armhf /usr/lib/jvm/java-7-openjdk-i386/ /usr/lib/jvm/default-java'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-8-oracle//bin/java -a -z '' ']'
+ JAVA_HOME=/usr/lib/jvm/java-8-oracle/
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/j2sdk1.8-oracle//bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/jdk-7-oracle-x64/bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-oracle/bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/j2sdk1.7-oracle//bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk/bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk-amd64//bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk-armhf/bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/java-7-openjdk-i386//bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ for jdir in '$JDK_DIRS'
+ '[' -r /usr/lib/jvm/default-java/bin/java -a -z /usr/lib/jvm/java-8-oracle/ ']'
+ export JAVA_HOME
+ ES_HOME=/usr/share/elasticsearch
+ MAX_OPEN_FILES=65535
+ LOG_DIR=/var/log/elasticsearch
+ DATA_DIR=/var/lib/elasticsearch
+ WORK_DIR=/tmp/elasticsearch
+ CONF_DIR=/etc/elasticsearch
+ CONF_FILE=/etc/elasticsearch/elasticsearch.yml
+ MAX_MAP_COUNT=262144
+ PID_DIR=/var/run/elasticsearch
+ '[' -f /etc/default/elasticsearch ']'
+ . /etc/default/elasticsearch
++ ES_HEAP_SIZE=19g
+ PID_FILE=/var/run/elasticsearch/elasticsearch.pid
+ DAEMON=/usr/share/elasticsearch/bin/elasticsearch
+ DAEMON_OPTS='-d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch'
+ export ES_HEAP_SIZE
+ export ES_HEAP_NEWSIZE
+ export ES_DIRECT_SIZE
+ export ES_JAVA_OPTS
+ test -x /usr/share/elasticsearch/bin/elasticsearch
+ case "$1" in
+ checkJava
+ '[' -x /usr/lib/jvm/java-8-oracle//bin/java ']'
+ JAVA=/usr/lib/jvm/java-8-oracle//bin/java
+ '[' '!' -x /usr/lib/jvm/java-8-oracle//bin/java ']'
+ '[' -n '' -a -z 19g ']'
+ log_daemon_msg 'Starting Elasticsearch Server'
+ '[' -z 'Starting Elasticsearch Server' ']'
+ log_use_fancy_output
+ TPUT=/usr/bin/tput
+ EXPR=/usr/bin/expr
+ '[' -t 1 ']'
+ '[' xxterm '!=' x ']'
+ '[' xxterm '!=' xdumb ']'
+ '[' -x /usr/bin/tput ']'
+ '[' -x /usr/bin/expr ']'
+ /usr/bin/tput hpa 60
+ /usr/bin/tput setaf 1
+ '[' -z ']'
+ FANCYTTY=1
+ case "$FANCYTTY" in
+ true

#4
+ /usr/bin/tput xenl
++ /usr/bin/tput cols
+ COLS=144
+ '[' 144 ']'
+ '[' 144 -gt 6 ']'
++ /usr/bin/expr 144 - 7
+ COL=137
+ log_use_plymouth
+ '[' n = y ']'
+ plymouth --ping
+ printf ' * Starting Elasticsearch Server       '
 * Starting Elasticsearch Server       ++ /usr/bin/expr 144 - 1
+ /usr/bin/tput hpa 143
                                                                                                                                               + printf ' '
 ++ pidofproc -p /var/run/elasticsearch/elasticsearch.pid elasticsearch
++ local pidfile base status specified pid OPTIND
++ pidfile=
++ specified=
++ OPTIND=1
++ getopts p: opt
++ case "$opt" in
++ pidfile=/var/run/elasticsearch/elasticsearch.pid
++ specified=specified
++ getopts p: opt
++ shift 2
++ '[' 1 -ne 1 ']'
++ base=elasticsearch
++ '[' '!' specified ']'
++ '[' -n /var/run/elasticsearch/elasticsearch.pid -a -r /var/run/elasticsearch/elasticsearch.pid ']'
++ read pid
++ '[' -n '' ']'
++ '[' -n specified ']'
++ '[' -e /var/run/elasticsearch/elasticsearch.pid -a '!' -r /var/run/elasticsearch/elasticsearch.pid ']'
++ return 3
+ pid=
+ '[' -n '' ']'
+ mkdir -p /var/log/elasticsearch /var/lib/elasticsearch /tmp/elasticsearch
+ chown elasticsearch:elasticsearch /var/log/elasticsearch /var/lib/elasticsearch /tmp/elasticsearch
+ '[' -n /var/run/elasticsearch ']'
+ '[' '!' -e /var/run/elasticsearch ']'
+ '[' -n /var/run/elasticsearch/elasticsearch.pid ']'
+ '[' '!' -e /var/run/elasticsearch/elasticsearch.pid ']'
+ '[' -n 65535 ']'
+ ulimit -n 65535
+ '[' -n '' ']'
+ '[' -n 262144 -a -f /proc/sys/vm/max_map_count ']'
+ sysctl -q -w vm.max_map_count=262144
+ log_daemon_msg 'sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch'
+ '[' -z 'sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch' ']'
+ log_use_fancy_output
+ TPUT=/usr/bin/tput
+ EXPR=/usr/bin/expr
+ '[' -t 1 ']'
+ '[' xxterm '!=' x ']'
+ '[' xxterm '!=' xdumb ']'
+ '[' -x /usr/bin/tput ']'
+ '[' -x /usr/bin/expr ']'
+ /usr/bin/tput hpa 60
+ /usr/bin/tput setaf 1
+ '[' -z 1 ']'
+ true
+ case "$FANCYTTY" in
+ true
+ /usr/bin/tput xenl
++ /usr/bin/tput cols
+ COLS=144
+ '[' 144 ']'
+ '[' 144 -gt 6 ']'
++ /usr/bin/expr 144 - 7
+ COL=137
+ log_use_plymouth
+ '[' n = y ']'
+ plymouth --ping
+ printf ' * sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch       '
 * sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch       ++ /usr/bin/expr 144 - 1
+ /usr/bin/tput hpa 143
                                                                                                                                               + printf ' '
 + start-stop-daemon --start -b --user elasticsearch -c elasticsearch --pidfile /var/run/elasticsearch/elasticsearch.pid --exec /usr/share/elasticsearch/bin/elasticsearch -- -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch
+ return=0
+ '[' 0 -eq 0 ']'
+ i=0
+ timeout=10
+ exit 0

(Magnus Bäck) #5

Hmm, okay. It looks like it's actually trying to start ES (i.e. the init script isn't dying earlier on). I'd try sudo strace -f bash /etc/init.d/elasticsearch start (pipe stdout and stderr to a file!) to get further clues, but beware that reading the strace output isn't necessarily easy.


#6

Where would you like me to send it to? The file is fairly large (~500K uncompressed, ~250K compressed).


(Magnus Bäck) #7

I was hoping you'd be able to analyze it yourself. I don't have time to do it. Maybe someone else here can chip in. I'd share the file on Google Drive, Dropbox, or a similar service.


#8

I'll open a bug report.


(Mark Walkom) #9

Run this yourself, it might tell you something more;

sudo -u elasticsearch /usr/share/elasticsearch/bin/elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch

#10

warkolm, I mentioned that I did it in my first post and that works fine (and doesn't return logs). Running it via 'service' (or /etc/init.d/elasticsearch) fails.


(Clinton Gormley) #11

... in which case I'd try the same thing that @warkolm suggested but remove the -d option to run Elasticsearch in the foreground (and log to STDERR).

Also check what you have in /etc/default/elasticsearch as that is used when starting as a service. Btw, 19GB of heap out of 24GB total is not a good ratio, you are limiting the effectiveness of the file system cache. I'd also check what else is using memory on your system in case you have processes which are competing with each other.


#12

His command was with the -d but I already tried that suggestion (someone else suggested that yesterday on IRC) without it that and it starts just fine. I can give you the output if you'd like.

Should I drop to 12Gb?


#13

What should I try next?


(Mike Simos) #14

Hi,

I'd edit /etc/elasticsearch/logging.yml and change the log level to DEBUG:

es.logger.level: DEBUG

From the initial post it seems like you maybe only logging at INFO level. If you set this to DEBUG you may get some additional information as to whats happening. Then restart Elasticsearch and check the log file again.


#15

I stopped it, added that line in /etc/elasticsearch/logging.yml (just had to edit the second line and replace INFO by DEBUG), cleared the log directory and started it using 'service elasticsearch start'.

It hung for 5-10 secs but it didn't generate any logs.

Would it be possible it is a problem with the JDK?


(Mike Simos) #16

Hi,

Run:

sudo strace -s 99 -f bash /etc/init.d/elasticsearch start

Then paste the last 10 lines here after waiting about 1 minute.


#17

I output stdout and stderr to a file that I kept just in case. It exited by itself within 10 seconds. Anyway, here are the 10 lines as requested:

rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f9b9938ad40}, {0x4438a0, [], SA_RESTORER, 0x7f9b9938ad40}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=25287, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffe20c75ed8, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = 0
write(1, "   ...fail!\n", 12)           = 12
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(1)                           = ?
+++ exited with 1 +++

(Mike Simos) #18

Hi,

You probably need to provide some earlier messages since it just shows it failed. Doesn't give any indication why. You can use something like pastebin to dump the whole thing.


#19

It's too big for pastebin. The file is 5.41Mb. Do you have an email I can it to?


(Mike Simos) #20

Try using google drive, box, gist.github.com, etc..