JVM crash under load

Hi,

I am working on a project with elasticsearch version 0.20.2 in a single
node configuration. The project runs fine under normal load, however, when
we have a large load on the project with a mix of writes and reads on a
index, we get a JVM crash with a EXCEPTION_ACCESS_VIOLATION. We have only
seen this crash occur on Windows. I've attached the hs_err_pid.log file of
the most recent crash. This has occurred several times on both Java 6 and
Java 7 with the same lucene call being referenced as the offending frame in
a elasticsearch cache thread.

Any tips or insight into this issue would be greatly appreciated.

Thanks

-Jay

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/562ec64c-518c-421b-9d41-7c200baa570e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I see you were seeing crashes with Java 7u45. There are reports about
crashes with Java > 7u40.

Do JVM crashes also occur with Java 7u25?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoExoRn-uYPurqXVoa1O9KiFR1EkXvxd1uedji_7_pjXcA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'll set up a test with 7u25 tonight. One of the issues we are experiencing
is that there is no clear way to reproduce. The crash has happened in under
30 minutes on one system and others have run for 5+ days prior to crashing.

I have found the following Lucene
issue [LUCENE-5212] java 7u40 causes sigsegv and corrupt term vectors - ASF JIRA which references
issues post 7u25 (fixed in upcoming 7u60). However that seems to reference
a SIGSEGV (segfault), but I have found very little about
EXCEPTION_ACCESS_VIOLATION crashes in the JVM.

Also attaching the hs_err_pid.log from a Java 6 instance that crashed
(6u35).

On Monday, December 9, 2013 6:41:00 PM UTC-5, Jörg Prante wrote:

I see you were seeing crashes with Java 7u45. There are reports about
crashes with Java > 7u40.

Do JVM crashes also occur with Java 7u25?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50fee714-03e2-42f8-ba60-2ca09fd71aff%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Please don't send logs files to the list, you're sending this to
hundreds/thousands of people who don't want it.
Just use something like a gist or pastebin.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 10 December 2013 12:28, Jay Modi jaymode@gmail.com wrote:

I'll set up a test with 7u25 tonight. One of the issues we are
experiencing is that there is no clear way to reproduce. The crash has
happened in under 30 minutes on one system and others have run for 5+ days
prior to crashing.

I have found the following Lucene issue
[LUCENE-5212] java 7u40 causes sigsegv and corrupt term vectors - ASF JIRA which references issues
post 7u25 (fixed in upcoming 7u60). However that seems to reference a
SIGSEGV (segfault), but I have found very little about
EXCEPTION_ACCESS_VIOLATION crashes in the JVM.

Also attaching the hs_err_pid.log from a Java 6 instance that crashed
(6u35).

On Monday, December 9, 2013 6:41:00 PM UTC-5, Jörg Prante wrote:

I see you were seeing crashes with Java 7u45. There are reports about
crashes with Java > 7u40.

Do JVM crashes also occur with Java 7u25?

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/50fee714-03e2-42f8-ba60-2ca09fd71aff%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YNUv6wRbwd9eswJ9b16%2BE9GSWA68FjC0QaD45qPeoGYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Please check your Java installation. Java Home directory is C:\Program
Files (x86)\Java\jdk1.6.0_21
and Java version in the report is 1.6.0_35-b10

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFw%3DUwk7jtgsUbv8Oi7GCABGuVuanRDhT7014f-hNqznQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

After examining some other instance/environments we have found more JVM
crashes with JDK 6u35.

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000000fe9a1a,

pid=6016, tid=3540

JRE version: 6.0_35-b10

Java VM: Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode

windows-amd64 compressed oops)

Problematic frame:

J

org.apache.lucene.index.TermBuffer.read(Lorg/apache/lucene/store/IndexInput;Lorg/apache/lucene/index/FieldInfos;)V

We have only seen these on Windows and they always occur in a elasticsearch
cache thread. One documented difference that I have found with
elasticsearch on Windows is the use of mmapfs stores by default
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html),
and in the crash log files we see MMapDirectory$MMapIndexInput in the
register to memory mapping (consistent on all of our crashes). Doing some
digging into Lucene MMapDirectory, I found this stackoverflow post by Uwe
Schindler
(http://stackoverflow.com/questions/8224843/jvm-crashes-on-lucene-datainput-readvint)
where he describes the MMapDirectory implementation in Lucene and mentions
that there is not 100% safety and the calling application must ensure the
safety:

"By default MMapDirectory unmaps the files after closing the IndexInputs.
MMapDirectory is not synchronized at all, so when another thread tries to
access the IndexInput after unmapping it will access an unmapped address
and will SIGSEGV.

If your code would be correct this cannot happen, but it looks like you are
using an already closed IndexReader/IndexWriter to access the index. Before
Lucene 3.5 (will come out soon), missing checks in IndexReader will make it
possible that an already closed IndexReader with all its closed (and
unmapped) IndexInputs tries to access index data and segfaults.

In 3.5 we added additional safety checks to prevent this illegal access,
but its not 100% (as synchronization is missing). I would review the code
and check that nothing accesses closed index."

I believe we are hitting a rare case where elasticsearch has closed an
IndexReader but another thread is trying to read from it. I have also seen
a similar report on this mailing list except it was for
SunOS https://groups.google.com/forum/#!topic/elasticsearch/e-Eh_guIxA4

Is anyone using a store type other than mmapfs on Windows? How is the
performance?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93e8e4bf-7e45-4169-a9f6-a88a2df7aa35%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Many bugs in that area have been fixed in ES 0.90+ so I suggest updating.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFPH-1GTyj%3DStqHCO5rWt3Vo_zxtu-OmUaEzETjTb1%3Dig%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Same goes for your java version, 6u35 is pretty old.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 December 2013 02:15, joergprante@gmail.com joergprante@gmail.comwrote:

Many bugs in that area have been fixed in ES 0.90+ so I suggest updating.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFPH-1GTyj%3DStqHCO5rWt3Vo_zxtu-OmUaEzETjTb1%3Dig%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624abVvaNp29QQwJChxBytK5uWh0G72oWTeG1g7yHBDfWsA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.