Hi There,
Merge and Shard Recovery failures because of lack of enough unfragmented virtual address space.
Please see the log below.
We have 10 node cluster (3 master, 4 data, 3 client) , Azure VMs of size 56 GB(28GB Heap) and 8 core processor.
each Index of 20 shards and 1 replica.
[2017-02-08 07:07:01,293][WARN ][indices.cluster ] [ITTESPROD-DATA2] [[tracemessages][17]] marking and sending shard failed due to [failed recovery]
RecoveryFailedException[[tracemessages][17]: Recovery failed from {ITTESPROD-DATA4}{jBYtvSw0TcO_Ir3p8Tu2Tg}{10.158.36.208}{10.158.36.208:9300}{master=false} into {ITTESPROD-DATA2}{6gmReXcCTTOi1sYUYfp7yA}{10.158.36.212}{10.158.36.212:9300}{master=false}]; nested: RemoteTransportException[[ITTESPROD-DATA4][10.158.36.208:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: NotSerializableExceptionWrapper[i_o_exception: The paging file is too small for this operation to complete: MMapIndexInput(path="J:\Data\ittesprod\nodes\0\indices\tracemessages\17\index\_25_Lucene50_0.tim") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 598455700 bytes. Windows is unfortunately very limited on virtual address space. If your index size is several hundred Gigabytes, consider changing to Linux. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]];
at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:258)
at org.elasticsearch.indices.recovery.RecoveryTarget.access$1100(RecoveryTarget.java:69)
at org.elasticsearch.indices.recovery.RecoveryTarget$RecoveryRunner.doRun(RecoveryTarget.java:508)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2017-02-06 02:07:08,081][ERROR][index.engine ] [ITTESPROD-DATA2] [dependencyrequests-2017.04][17] failed to merge
java.io.IOException: The paging file is too small for this operation to complete: MMapIndexInput(path="F:\Data\ittesprod\nodes\0\indices\dependencyrequests-2017.04\17\index\_5q.fdt") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 875590821 bytes. Windows is unfortunately very limited on virtual address space. If your index size is several hundred Gigabytes, consider changing to Linux. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:926)
at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:273)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:247)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:151)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
at org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsReader(Lucene50StoredFieldsFormat.java:173)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:117)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4212)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
I have disabled the paging ( Advanced Systems Settings -> Performance- Settings-> Advanced -> Virtual memory ) and set the Virtual Memory to 0 bytes. as per some suggestions, to disable Swapping.
So, in what scenario? ES will start looking for Virtual Memory? is that only when JVM heap is out of space?
Also, is it a good thing to disable Virtual Memory in Windows settings?
If I enable it, it is taking more than 30 Secs for GC and ping is timing out and node is getting disconnected.
Here is the config look like
cluster.name: ittesprod
node.name: ITTESPROD-DATA2
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
network.host: _non_loopback_
http.enabled: false
discovery.zen.ping.unicast.hosts: ["10.158.36.220","10.158.36.200","10.158.36.201","10.158.36.202","10.158.36.210","10.158.36.211","10.158.36.212"]
threadpool.bulk.queue_size: 500
bootstrap.memory_lock: true
indices.memory.index_buffer_size : 25%
indices.requests.cache.size: 5%
indices.queries.cache.size: 15%
indices.store.throttle.max_bytes_per_sec : 500mb
cloud.azure.storage.default.account: xxxx
cloud.azure.storage.default.key: