we are using elasticsearch version 0.90.7 and following are gist file. https://gist.github.com/jasonwee/8282477 not sure what is happening
"Exception cause unwrapping ran for 10 levels" and if it is a concern? The
exception shown after some time. Any idea?
ES is unable to transport an exception back to the client, exceptions are
thrown, passed back and forth, and the procedure is retried until the Java
stack is full (StackOverflowError). Yes, it is a concern, it is a bug, a
stack overflow must not happen.
Is the gist the whole stack trace? I bet you only posted the first few
lines...
In the meantime, can you check if you run the same JVM version at server
and client side? A transport exception can not be deserialized, and this is
typical for a JVM mismatch.
Is the gist the whole stack trace? I bet you only posted the first few
lines...
log filesize is 427MB and total lines 5463550 and if i paste the entire
file into gist, it would break github limit? After gzip the file, it is
5.9MB. If you have a server, I can upload the gzip log to your server.
server and client both running the same version the entire times.
$ java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)
ES is unable to transport an exception back to the client, exceptions are
thrown, passed back and forth, and the procedure is retried until the Java
stack is full (StackOverflowError). Yes, it is a concern, it is a bug, a
stack overflow must not happen.
Is the gist the whole stack trace? I bet you only posted the first few
lines...
In the meantime, can you check if you run the same JVM version at server
and client side? A transport exception can not be deserialized, and this is
typical for a JVM mismatch.
I was not speaking of the whole logfile, but the whole stacktrace, just to
make sure where the culprit started. If there was no OOM or other things
around, it would be surprising.
Sure, my bad. There is no OOM and here is the second gist, gist:8294514 · GitHub . The exceptions are repetitive,
so that explains large number of lines in the log file.
Does the second gist help to determine the problem? I can get the log if it
does not tell where might be the cause.
I was not speaking of the whole logfile, but the whole stacktrace, just to
make sure where the culprit started. If there was no OOM or other things
around, it would be surprising.
Yes, it looks like two nodes do not agree about an update action and a
version conflict is pinging between them, node1 and node4.
Not sure if this happens while index recovery or while an update is
executed, but it is definitely worth raising an issue at the Elasticsearch
github to let the Elasticsearch core team have a look. It might be some
kind of a deadlock.
Today when I investigated this issue, and just do a query to the time stamp
when the exceptions is happening, data were indexed though. The reason I
query is that, we worry if there is no data index during that period
exceptions are happening , thus data lost.
Yes, it looks like two nodes do not agree about an update action and a
version conflict is pinging between them, node1 and node4.
Not sure if this happens while index recovery or while an update is
executed, but it is definitely worth raising an issue at the Elasticsearch
github to let the Elasticsearch core team have a look. It might be some
kind of a deadlock.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.