I am working on a project with elasticsearch version 0.20.2 in a single
node configuration. The project runs fine under normal load, however, when
we have a large load on the project with a mix of writes and reads on a
index, we get a JVM crash with a EXCEPTION_ACCESS_VIOLATION. We have only
seen this crash occur on Windows. I've attached the hs_err_pid.log file of
the most recent crash. This has occurred several times on both Java 6 and
Java 7 with the same lucene call being referenced as the offending frame in
a elasticsearch cache thread.
Any tips or insight into this issue would be greatly appreciated.
I'll set up a test with 7u25 tonight. One of the issues we are experiencing
is that there is no clear way to reproduce. The crash has happened in under
30 minutes on one system and others have run for 5+ days prior to crashing.
I have found the following Lucene
issue [LUCENE-5212] java 7u40 causes sigsegv and corrupt term vectors - ASF JIRA which references
issues post 7u25 (fixed in upcoming 7u60). However that seems to reference
a SIGSEGV (segfault), but I have found very little about
EXCEPTION_ACCESS_VIOLATION crashes in the JVM.
Also attaching the hs_err_pid.log from a Java 6 instance that crashed
(6u35).
On Monday, December 9, 2013 6:41:00 PM UTC-5, Jörg Prante wrote:
I see you were seeing crashes with Java 7u45. There are reports about
crashes with Java > 7u40.
Please don't send logs files to the list, you're sending this to
hundreds/thousands of people who don't want it.
Just use something like a gist or pastebin.
I'll set up a test with 7u25 tonight. One of the issues we are
experiencing is that there is no clear way to reproduce. The crash has
happened in under 30 minutes on one system and others have run for 5+ days
prior to crashing.
I have found the following Lucene issue [LUCENE-5212] java 7u40 causes sigsegv and corrupt term vectors - ASF JIRA which references issues
post 7u25 (fixed in upcoming 7u60). However that seems to reference a
SIGSEGV (segfault), but I have found very little about
EXCEPTION_ACCESS_VIOLATION crashes in the JVM.
Also attaching the hs_err_pid.log from a Java 6 instance that crashed
(6u35).
On Monday, December 9, 2013 6:41:00 PM UTC-5, Jörg Prante wrote:
I see you were seeing crashes with Java 7u45. There are reports about
crashes with Java > 7u40.
We have only seen these on Windows and they always occur in a elasticsearch
cache thread. One documented difference that I have found with
elasticsearch on Windows is the use of mmapfs stores by default
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html),
and in the crash log files we see MMapDirectory$MMapIndexInput in the
register to memory mapping (consistent on all of our crashes). Doing some
digging into Lucene MMapDirectory, I found this stackoverflow post by Uwe
Schindler
(http://stackoverflow.com/questions/8224843/jvm-crashes-on-lucene-datainput-readvint)
where he describes the MMapDirectory implementation in Lucene and mentions
that there is not 100% safety and the calling application must ensure the
safety:
"By default MMapDirectory unmaps the files after closing the IndexInputs.
MMapDirectory is not synchronized at all, so when another thread tries to
access the IndexInput after unmapping it will access an unmapped address
and will SIGSEGV.
If your code would be correct this cannot happen, but it looks like you are
using an already closed IndexReader/IndexWriter to access the index. Before
Lucene 3.5 (will come out soon), missing checks in IndexReader will make it
possible that an already closed IndexReader with all its closed (and
unmapped) IndexInputs tries to access index data and segfaults.
In 3.5 we added additional safety checks to prevent this illegal access,
but its not 100% (as synchronization is missing). I would review the code
and check that nothing accesses closed index."
I believe we are hitting a rare case where elasticsearch has closed an
IndexReader but another thread is trying to read from it. I have also seen
a similar report on this mailing list except it was for
SunOS https://groups.google.com/forum/#!topic/elasticsearch/e-Eh_guIxA4
Is anyone using a store type other than mmapfs on Windows? How is the
performance?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.