Elasticsearch hang after reinstall


(JiGe) #1

Elasticsearch version: 5.0.0-alpha1

Plugins installed: []

JVM version:1.8.0_60

OS version:SUSE Enterprise 12

Description of the problem including expected versus actual behavior:
remove 5.0.0-alpha1 with rpm -ev. then install elasticsearch-2.4.1. Failed to start elasticsearch

Steps to reproduce:

  1. Install elasticsearch-5.0.0-alpha1
  2. uninstall elasticsearch-5.0.0-alpha1
  3. install elasticsearch-2.4.1
  4. start elasticsearch with sudo systemctl start elasticsearch
  5. ps -ef | grep elastic
    elastic+ 6400 1 0 03:17 pts/1 00:00:00 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d -p /var/run/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch
  6. /bin/netstat -nap | grep 9200, nothing could be found
    Failed to "curl -XGET localhost:9200", network host in elasticsearch.yml has been configured to 0.0.0.0

Provide logs (if relevant): No logs under /var/log/elasticsearch
Describe the feature:
jstack the pid:
"Signal Dispatcher" #5 daemon prio=9 os_prio=0 tid=0x00007fcfc0172000 nid=0x4dd0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (Concurrent GC)" #4 daemon prio=9 os_prio=0 tid=0x00007fcfc0170800 nid=0x4dcf waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fcfc0134000 nid=0x4dce in Object.wait() [0x00007fcf734fb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)

  • waiting on <0x00000000c00070b8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
  • locked <0x00000000c00070b8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fcfc0132000 nid=0x4dcd in Object.wait() [0x00007fcf735fc000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)

  • waiting on <0x00000000c0006af8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
  • locked <0x00000000c0006af8> (a java.lang.ref.Reference$Lock)

"main" #1 prio=5 os_prio=0 tid=0x00007fcfc0009800 nid=0x4db9 runnable [0x00007fcfc7955000]
java.lang.Thread.State: RUNNABLE
at sun.nio.fs.UnixNativeDispatcher.stat0(Native Method)
at sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:286)
at sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70)
at sun.nio.fs.UnixFileStore.devFor(UnixFileStore.java:55)
at sun.nio.fs.UnixFileStore.(UnixFileStore.java:70)
at sun.nio.fs.LinuxFileStore.(LinuxFileStore.java:48)
at sun.nio.fs.LinuxFileSystem.getFileStore(LinuxFileSystem.java:112)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.readNext(UnixFileSystem.java:213)
at sun.nio.fs.UnixFileSystem$FileStoreIterator.hasNext(UnixFileSystem.java:224)

  • locked <0x00000000c0b74e48> (a sun.nio.fs.UnixFileSystem$FileStoreIterator)
    at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:515)
    at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:459)
    at org.apache.lucene.util.IOUtils.spins(IOUtils.java:448)
    at org.elasticsearch.env.ESFileStore.(ESFileStore.java:57)
    at org.elasticsearch.env.Environment.(Environment.java:90)
    at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:81)
    at org.elasticsearch.common.cli.CliTool.(CliTool.java:107)
    at org.elasticsearch.common.cli.CliTool.(CliTool.java:100)
    at org.elasticsearch.bootstrap.BootstrapCLIParser.(BootstrapCLIParser.java:48)
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:242)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

"VM Thread" os_prio=0 tid=0x00007fcfc012d000 nid=0x4dcc runnable

"Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007fcfc001a800 nid=0x4dba runnable

"Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007fcfc001c000 nid=0x4dbb runnable


(Ed) #2

I don;t believe your data nodes can be down graded from 5.0 to 2.4 there are (breaking changes) from 2.4 to 5.0 (well actually upgrades to the data source) but in reverse those would possible prevent the down grade (unless in the release notes the say you can do it)

you probably want to clear out --default.path.data=/var/lib/elasticsearch directory and any of the other work directories to make sure you have purged the data stores.


(JiGe) #3

I'm sure that I have removed all the related elasticsearch files.
And I can't find any other java process with ps -ef.
It is reproducible, you can test in your environment.


(Ed) #4

Your going to have to open an issue with the github to have someone fix the issue.

But failing to start would produce some logs, if it is not starting there is probably an issue with the startup script not Elastic, also if your able to get a Thread dump and PID it sounds like it has started. Maybe you could explain a little more

You are using the alpha of 5.0 I would use 5.2 you really don't want to use an older version of Elastic (2.4) now


(JiGe) #5

I have opened an issue in github. But it was closed and they told me to open the discuss topic here.
I tried es 5.x also and it has the same hang problem.
And I also tried to start es directly () with this command:
"/usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.2.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.conf=/etc/elasticsearch"
The es process hang again.
There is no logs created:
c4dev@si-portal-server:~> ls /var/log/elasticsearch/
c4dev@si-portal-server:~>


(Ed) #6

what is in the log file?


(Ed) #7

check your selinux is it off?


(Ed) #8

what happens if you start it manually?


(JiGe) #9
  1. ps -ef | grep elastic
    elastic+ 6400 1 0 03:17 pts/1 00:00:00 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d -p /var/run/elasticsearch.pid --default.config=/etc/elasticsearch/elasticsearch.yml --default.path.home=/usr/share/elasticsearch --default.path.logs=/var/log/elasticsearch --default.path.data=/var/lib/elasticsearch --default.path.work=/tmp/elasticsearch --default.path.conf=/etc/elasticsearch
  2. /bin/netstat -nap | grep 9200, nothing could be found
    Failed to "curl -XGET localhost:9200", network host in elasticsearch.yml has been configured to 0.0.0.0

(Ed) #10

try lsof -p 6400


(JiGe) #11

Try again after close the selinux. Same again. I have opened a issue in github.


(JiGe) #12

c4dev@si-portal-server:~> lsof -p 6400
It is curious that the command is hanged too.


(Ed) #13

just as a guess check your dns resolution, do you have any nfs mounts, check that they are mounted and not stale
you have to run the lsof as root or the elastic user otherwise you will not have enough privileges


(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.