Today i have faced a new one in production servers.Can any one help to address the root cause, I am getting the following exception in production servers.
2016-02-10 11:12:52,734][DEBUG][action.search.type ] [172.159.22.110] [908744] Failed to execute fetch phase
org.elasticsearch.ElasticsearchException: a fault occurred in a recent unsafe memory access operation in compiled Java code
at org.elasticsearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:44)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:513)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:452)
at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:449)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:254)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:335)
Thanks for the response
I am using mmap, also having enough diskspace around 4 TB. But when i getting this exception the server's memory is completely used(64 GB .no free memory left). The server capacity around 64 GB ,for elasticsearch heap 30 GB and the rest of memory allocated for mmap.
Currently we are running with jdk1.8.0_66 ,Latest one. in elasticsearch servers.Client servers also have the same jvm versions.Still i am gettting the exception some times in a day.
That's not the latest, 8u71 is, but the release notes contain nothing pertinent here. I remain unconvinced this is an Elasticsearch issue and I still strongly suspect an issue in the memory subsystem; check /var/log/kern.log and other such resources to see if anything presents.
It would be interesting to know how you got into that situation. Do you have large files that exceed ~30GB? From your setup I assume you have situations where ES is forced to read >30GB from mmap-ed file as a whole, which is a bit surprising to me.
You will see this error if Linux kernel can no longer guarantee the integrity of an mmap-ed file to the JVM.
Either the disk is full and kernel mmap can't write to the file so it gets truncated, or RAM is full and the kernel can no longer write valid file buffers to disk so the mmap-ed file gets corrupted.
Using the most recent kernel version possible might help in order to get the JVM notified better. But, I think the reason is your system configuration is too lenient, you should not allow to get the kernel into that hard resource limit. Adjusting RAM for OS file system or limiting mmap could help to remedy the situation.
We have a high volume on the ES Cluster .Daily we have recieved around 5 Billion docs consuming of 2 TB Disk Space, We have some shards around 200 GB. Are you mentioning the large file is single shard. or lucene internal files.
I have to enable the kernal logs.
We are running 10 servers for ES with the configuration of 64 GB RAM and 24 core CPU's . We have allocate 30 GB for heap space rest for mmap file system.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.