Merge and Shard Recovery failures because of lack of enough unfragmented virtual address space

Hi There,

Merge and Shard Recovery failures because of lack of enough unfragmented virtual address space.

Please see the log below.

We have 10 node cluster (3 master, 4 data, 3 client) , Azure VMs of size 56 GB(28GB Heap) and 8 core processor.
each Index of 20 shards and 1 replica.

[2017-02-08 07:07:01,293][WARN ][indices.cluster          ] [ITTESPROD-DATA2] [[tracemessages][17]] marking and sending shard failed due to [failed recovery]
RecoveryFailedException[[tracemessages][17]: Recovery failed from {ITTESPROD-DATA4}{jBYtvSw0TcO_Ir3p8Tu2Tg}{10.158.36.208}{10.158.36.208:9300}{master=false} into {ITTESPROD-DATA2}{6gmReXcCTTOi1sYUYfp7yA}{10.158.36.212}{10.158.36.212:9300}{master=false}]; nested: RemoteTransportException[[ITTESPROD-DATA4][10.158.36.208:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NotSerializableExceptionWrapper[i_o_exception: The paging file is too small for this operation to complete: MMapIndexInput(path="J:\Data\ittesprod\nodes\0\indices\tracemessages\17\index\_25_Lucene50_0.tim") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 598455700 bytes. Windows is unfortunately very limited on virtual address space. If your index size is several hundred Gigabytes, consider changing to Linux. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]];
	at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:258)
	at org.elasticsearch.indices.recovery.RecoveryTarget.access$1100(RecoveryTarget.java:69)
	at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:508)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)


[2017-02-06 02:07:08,081][ERROR][index.engine             ] [ITTESPROD-DATA2] [dependencyrequests-2017.04][17] failed to merge
java.io.IOException: The paging file is too small for this operation to complete: MMapIndexInput(path="F:\Data\ittesprod\nodes\0\indices\dependencyrequests-2017.04\17\index\_5q.fdt") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 875590821 bytes. Windows is unfortunately very limited on virtual address space. If your index size is several hundred Gigabytes, consider changing to Linux. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
	at sun.nio.ch.FileChannelImpl.map0(Native Method)
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:926)
	at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:273)
	at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:247)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:151)
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
	at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsReader(Lucene50StoredFieldsFormat.java:173)
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:117)
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4212)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
	at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

I have disabled the paging ( Advanced Systems Settings -> Performance- Settings-> Advanced -> Virtual memory ) and set the Virtual Memory to 0 bytes. as per some suggestions, to disable Swapping.

So, in what scenario? ES will start looking for Virtual Memory? is that only when JVM heap is out of space?

Also, is it a good thing to disable Virtual Memory in Windows settings?

If I enable it, it is taking more than 30 Secs for GC and ping is timing out and node is getting disconnected.

Here is the config look like

cluster.name: ittesprod
node.name: ITTESPROD-DATA2
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
network.host: _non_loopback_
http.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.210","10.158.36.211","10.158.36.212"]
threadpool.bulk.queue_size: 500
bootstrap.memory_lock: true
indices.memory.index_buffer_size : 25%
indices.requests.cache.size: 5%
indices.queries.cache.size: 15%
indices.store.throttle.max_bytes_per_sec : 500mb
cloud.azure.storage.default.account: xxxx
cloud.azure.storage.default.key:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.