Mmap setting gives " java.io.IOException: Map failed" exception in elastic search logs


(hrishikesh prabhune) #1

I have two elastic search nodes set up with 30gb of heap size each on a
large box having 128gb of RAM and 3.1Tb of store size. Apart from all the
standard configuration settings in elastic search.yml file I added an
additional setting of
index.store.type:mmapfs

By default Elastic search uses NIODirectory. So I change it to mmapfs and
restarted the cluster with this new setting

The index size and shard configuration are as follows:
index.number_of_shards: 10
index.number_of_replicas:0

Each index size is approx. 30gb and in total there are 432 indices.

The mmap setting works nicely until each node hits 1041 Gb of virtual
memory mark (stats noted down by using top command) . After this the
recovery (assigning the unassigned shards) stops and I get the following
errors in the logs

[2013-11-06 21:50:52,190][WARN ][monitor.jvm ] [node0] [gc][
ConcurrentMarkSweep][575][2] duration [22.1s], collections [2]/[23s], total
[22.1s]/[22.1s], memory [11.6gb]->[11.3gb]/[29.8gb], all_pools {[Code Cache]
[6.1mb]->[6.1mb]/[48mb]}{[Par Eden Space] [210.1mb]->[15.4mb]/[1.4gb]}{[Par
Survivor Space] [191.3mb]->[0b]/[191.3mb]}{[CMS Old Gen] [11.2gb]->[11.3gb
]/[28.1gb]}{[CMS Perm Gen] [30.3mb]->[30.3mb]/[82mb]}

[2013-11-06 21:51:04,303][WARN ][monitor.jvm ] [node0] [gc][
ConcurrentMarkSweep][576][3] duration [12s], collections [1]/[12.1s], total
[12s]/[34.2s], memory [11.3gb]->[11.3gb]/[29.8gb], all_pools {[Code Cache] [
6.1mb]->[6.1mb]/[48mb]}{[Par Eden Space] [15.4mb]->[80.8kb]/[1.4gb]}{[Par
Survivor Space] [0b]->[0b]/[191.3mb]}{[CMS Old Gen] [11.3gb]->[11.3gb]/[
28.1gb]}{[CMS Perm Gen] [30.3mb]->[30.3mb]/[82mb]}

[2013-11-06 21:51:04,310][WARN ][indices.memory ] [node0] failed
to set shard [2013-08-22.00:00][9] index buffer to [4mb]

[2013-11-06 21:51:04,311][WARN ][indices.cluster ] [node0] [2013-03
-09.06:00][5] failed to start shard

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [2013-03
-09.06:00][5] failed recovery

    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(

IndexShardGatewayService.java:227)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(

ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(

ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:724)

Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [
2013-03-09.06:00][5] failed to open reader on writer

    at org.elasticsearch.index.engine.robin.RobinEngine.start(

RobinEngine.java:290)

    at org.elasticsearch.index.shard.service.InternalIndexShard.

performRecoveryPrepareForTranslog(InternalIndexShard.java:610)

    at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.

recover(LocalIndexShardGateway.java:200)

    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(

IndexShardGatewayService.java:174)

    ... 3 more

Caused by: java.io.IOException: Map failed

    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)

    at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)

    at org.apache.lucene.store.MMapDirectory$MMapIndexInput.<init>(

MMapDirectory.java:228)

    at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.

java:195)

    at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory

.java:72)

    at org.elasticsearch.index.store.Store$StoreDirectory.openInput(

Store.java:454)

    at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.<init>(

Lucene41PostingsReader.java:72)

    at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.

fieldsProducer(Lucene41PostingsFormat.java:430)

    at org.elasticsearch.index.codec.postingsformat.

BloomFilterPostingsFormat$BloomFilteredFieldsProducer.(
BloomFilterPostingsFormat.java:129)

    at org.elasticsearch.index.codec.postingsformat.

BloomFilterPostingsFormat.fieldsProducer(BloomFilterPostingsFormat.java:100)

    at org.elasticsearch.index.codec.postingsformat.

ElasticSearch090PostingsFormat.fieldsProducer(ElasticSearch090PostingsFormat
.java:81)

    at org.apache.lucene.codecs.perfield.

PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:194)

    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.

fieldsProducer(PerFieldPostingsFormat.java:233)

    at org.apache.lucene.index.SegmentCoreReaders.<init>(

SegmentCoreReaders.java:127)

    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:
    at org.apache.lucene.index.ReadersAndLiveDocs.getReader(

ReadersAndLiveDocs.java:121)

    at org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(

ReadersAndLiveDocs.java:218)

    at org.apache.lucene.index.StandardDirectoryReader.open(

StandardDirectoryReader.java:100)

    at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java

:111)

    at org.apache.lucene.search.SearcherManager.<init>(SearcherManager.

java:89)

    at org.elasticsearch.index.engine.robin.RobinEngine.

buildSearchManager(RobinEngine.java:1457)

    at org.elasticsearch.index.engine.robin.RobinEngine.start(

RobinEngine.java:278)

    ... 6 more

Caused by: java.lang.OutOfMemoryError: Map failed

    at sun.nio.ch.FileChannelImpl.map0(Native Method)

    at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)

    ... 28 more

Additional information of the linux machine on which elastic search is
running :

Output of ulimit -a :

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 1031971

max locked memory (kbytes, -l) unlimited

max memory size (kbytes, -m) unlimited

open files (-n) 1000000

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 1031971

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

$ cat /proc/version

Linux version 2.6.32-358.11.1.el6.x86_64 (mockbuild@c6b7.bsys.dev.centos.org
) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Jun 12
03:34:52 UTC 2013

$cat /proc/meminfo

MemTotal: 132112280 kB

MemFree: 443680 kB

Buffers: 1396772 kB

Cached: 60330616 kB

SwapCached: 312 kB

Active: 43277292 kB

Inactive: 18489120 kB

Active(anon): 60172 kB

Inactive(anon): 99900 kB

Active(file): 43217120 kB

Inactive(file): 18389220 kB

Unevictable: 65066160 kB

Mlocked: 481892 kB

SwapTotal: 51511288 kB

SwapFree: 51509156 kB

Dirty: 464 kB

Writeback: 0 kB

AnonPages: 65105120 kB

Mapped: 121808 kB

Shmem: 3496 kB

Slab: 3874984 kB

SReclaimable: 3702696 kB

SUnreclaim: 172288 kB

KernelStack: 11808 kB

PageTables: 133852 kB

NFS_Unstable: 0 kB

Bounce: 0 kB

WritebackTmp: 0 kB

CommitLimit: 117567428 kB

Committed_AS: 64937584 kB

VmallocTotal: 34359738367 kB

VmallocUsed: 509920 kB

VmallocChunk: 34290676980 kB

HardwareCorrupted: 0 kB

AnonHugePages: 64514048 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

DirectMap4k: 5056 kB

DirectMap2M: 2045952 kB

DirectMap1G: 132120576 kB

Even if I have set the virtual memory limit to unlimited, still the elastic
search java process cannot go beyond a certain virtual memory
allocation(1041 Gb).
Has anyone faced similar problem with the mmapfs setting? Can anybody
explain why I am getting such IO exceptions in the elastic search logs?

Thanks !!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

The error message does not depend on the 1041 Gb you observed in the OS as
allocated process virtual memory. The JVM knows not much about OS limits,
it can only react to OS messages from the VM subsystem. Here, in the
FileChannel map call, it depends on the size of the file to be opened. The
request of the JVM to the OS for that size has failed.

On RedHat Linux, look at /etc/security/limits.conf

Say if you have a user "es" running ES, you have to unlock the memory
mapping in /etc/security/limits.conf, or you are limited to the default,
which is 25% of the available RAM.

es soft memlock unlimited
es hard memlock unlimited

Disabling memlock enables also the HugePage feature of the Linux kernel.
HupePages are contiguous memory page areas that never swap. They are useful
on large memory machines to reduce page-based memory management and were
once developed for large database applications.

There are also recommendations for older RHEL, to use a KB limit for
memlock that is slightly less than installed RAM.

If you are unsure, you can also follow the Datastax recommended production
settings for Apache Cassandra for /etc/security/limits.conf. Cassandra has
very similar resource demands like ES.

http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html

Please note, that your heap settings are very high. There is a risk of
increased GC time and a sluggish system, also you may sooner run out of
resources (yes, quite the opposite you intended) If you can lower the heap
(say 8G), there is a chance FileChannel's map will work more smoothly.
FileChannel map allocates direct memory outside of the heap. If the heap is
very large, there is not much space left in the OS for the JVM to address.
Also, if the memory allocations are scattered, the OS may return a failure
message to the JVM for large requests even if there seems to be more than
enough free memory available. This may also lead to OOM when opening large
files, just because there is no more contiguous space available for a
single mmap call.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(hrishikesh prabhune) #3

Thank you for a prompt and precise answer. It worked like a charm!
appreciate it!!
According to your suggestion I made the heap size to 8gb . But GC started
to run every 3-5 seconds. I was not getting OOM errors in the log , just a
lot of jvm warnings. I think due to the huge number of shards and the huge
shard size 8gb heap size is not enough for me. Are there any additional
settings that you made to the elastic search config file in order to run it
on 8gb heap size?

On Thu, Nov 7, 2013 at 1:22 AM, joergprante@gmail.com <joergprante@gmail.com

wrote:

The error message does not depend on the 1041 Gb you observed in the OS as
allocated process virtual memory. The JVM knows not much about OS limits,
it can only react to OS messages from the VM subsystem. Here, in the
FileChannel map call, it depends on the size of the file to be opened. The
request of the JVM to the OS for that size has failed.

On RedHat Linux, look at /etc/security/limits.conf

Say if you have a user "es" running ES, you have to unlock the memory
mapping in /etc/security/limits.conf, or you are limited to the default,
which is 25% of the available RAM.

es soft memlock unlimited
es hard memlock unlimited

Disabling memlock enables also the HugePage feature of the Linux kernel.
HupePages are contiguous memory page areas that never swap. They are useful
on large memory machines to reduce page-based memory management and were
once developed for large database applications.

There are also recommendations for older RHEL, to use a KB limit for
memlock that is slightly less than installed RAM.

If you are unsure, you can also follow the Datastax recommended production
settings for Apache Cassandra for /etc/security/limits.conf. Cassandra has
very similar resource demands like ES.

http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html

Please note, that your heap settings are very high. There is a risk of
increased GC time and a sluggish system, also you may sooner run out of
resources (yes, quite the opposite you intended) If you can lower the heap
(say 8G), there is a chance FileChannel's map will work more smoothly.
FileChannel map allocates direct memory outside of the heap. If the heap is
very large, there is not much space left in the OS for the JVM to address.
Also, if the memory allocations are scattered, the OS may return a failure
message to the JVM for large requests even if there seems to be more than
enough free memory available. This may also lead to OOM when opening large
files, just because there is no more contiguous space available for a
single mmap call.

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4Nj_HUl78KA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

It looks like you want to drive really a lot of shards / indexes for two
nodes, so maybe you should scale out to more nodes.

8G is a large heap but if you can experiment, you could increase it unless
you have no more RAM, maybe 12G or 16G and so on, until you find a spot
where GC messages may disappear. Note that ~50% of RAM should be allocated
to the OS for file system. There are also tweaks for slightly better GC of
large heaps, but if you are sure you have very large data for the heap in
place, you will not improve much with only two nodes. The main point is,
for best performance, you should really scale out ES to more nodes. ES was
designed for "scaling out" to many many average servers, not for "scaling
up" (within a single or few huge servers).

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5