Elasticsearch VM in VirtualBox Freezes after 15-30 minutes?

I haven't been able to successfully troubleshoot why after 15-30 minutes one of my three VMs freezes and requires a restart. Eventually, a second VM will freeze if I wait after the first freezes. I haven't noticed anything particular in the log files for Elasticsearch that would lead me to a solution and I haven't found anything useful on Google.

My original cluster used Ubuntu Server 16.04 LTS, but when I started to experience problems I decided I would rule out the OS being the problem, since I am successfully running two other VMs on this host machine that have run for a couple of years now. The other VMs being a Ubuntu 14.04 LTS and WinServer 2012. At this point I figure either I have something miss configured or there is a bug in elasticsearch that I somehow managed to find. Any help or suggestions would be greatly appreciated.

3 Ubuntu 14.04 LTS virtual machines hosted in VirtualBox v5.0.22
Java version 1.8.0_91
Elasticseach version 2.3.3
Host machine has 32 gb of memory, quad-core cpu, and plenty of storage


cluster.name: mycluster 
node.name: mycluster-node-1
path.data: /media/sf_ElasticSearchStorage/14/data
path.logs: /media/sf_ElasticSearchStorage/14/logs
bootstrap.mlockall: true
http.port: 9200
discovery.zen.ping.unicast.hosts: ["", "", ""]
discovery.zen.minimum_master_nodes: 2
node.max_local_storage_nodes: 1







ExecStart=/usr/share/elasticsearch/bin/elasticsearch \
                                                -Des.pidfile=${PID_DIR}/elasticsearch.pid \
                                                -Des.default.path.home=${ES_HOME} \
                                                -Des.default.path.logs=${LOG_DIR} \
                                                -Des.default.path.data=${DATA_DIR} \


# Specifies the maximum file descriptor number that can be opened by this process

# Specifies the maximum number of bytes of memory that may be locked into RAM
# Set to "infinity" if you use the 'bootstrap.mlockall: true' option
# in elasticsearch.yml and 'MAX_LOCKED_MEMORY=unlimited' in /etc/default/elasticsearch

# Disable timeout logic and wait until process is stopped

# SIGTERM signal is used to stop the Java process

# Java process is never killed

# When a JVM receives a SIGTERM signal it exits with code 143



# Elasticsearch configuration directory

# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g

# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.mlockall: true' option
# in elasticsearch.yml (ES_HEAP_SIZE  must also be set).
# When using Systemd, the LimitMEMLOCK property must be set
# in /usr/lib/systemd/system/elasticsearch.service

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf

It's not a storage issue is it?

I don't believe so, since the drive that contains path.data and path.logs has around 160gb of free space. The drive that has the VM images on it has around ~20gb free. Although I am using VirtualBox's dynamic drive size instead of a fixed drive size if someone thinks that could be an issue or has had problems with that in the past?

I notice this evening that VIrtualBox released a new version 5.0.24, so I'll install that tonight and see if that changes anything.

All three are fully patched 14.04 LTS and I verified that all including where the data and log information is stored have more than enough space for the next future or more.

Upgraded to:
VirtualBox 5.0.24
Elasticsearch 2.3.4

Changed my VirtualBox VMs from dynamic to a static size of 30gb each.
Changed discovery.zen.minimum_master_nodes from 2 to 1.

I still cannot get the VMs to keep from freezing, however it will run for almost an hour now before one of the three VMs freezes. Although eventually all three will freeze and have to be restarted. I'm to the point where I'm not sure what else to try.

Attached below are the last few lines from the log files for elasticsearch.

VM 1:

NodeDisconnectedException[[node-3][][indices:data/write/bulk[s][r]] disconnected]
[2016-07-19 12:01:40,219][DEBUG][action.admin.indices.stats] [node-2] failed to execute [indices:monitor/stats] on node [yxAtaU5CTyWvjKijIA2Wmw]
NodeDisconnectedException[[node-3][][indices:monitor/stats[n]] disconnected]
[2016-07-19 12:01:40,222][DEBUG][action.admin.cluster.node.stats] [node-2] failed to execute on node [yxAtaU5CTyWvjKijIA2Wmw]
NodeDisconnectedException[[node-3][][cluster:monitor/nodes/stats[n]] disconnected]

VM 2

[2016-07-19 00:01:32,294][WARN ][transport.netty          ] [node-3] exception caught on transport layer [[id: 0xc83155bb, / => /]], closing connection
java.io.IOException: No route to host

VM 3

[2016-07-18 21:25:51,543][WARN ][cluster.action.shard     ] [node-1] [.marvel-es-1-2016.07.19][0] received shard failed for target shard [[.marvel-es-1-2016.07.19][0], node[HfPYw5rxRhyf7hqVHEr1LQ], [P], v[7], s[STARTED], a[id=S9W8JdQ0Rb-NkC4ZxxCL7w]], indexUUID [Ny-N-WJ7SZiPERBeOTf3rg], message [engine failure, reason [merge failed]], failure [MergeException[java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")]; nested: IOException[Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")]; nested: IOException[Invalid argument]; ]
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")
    at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1241)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:189)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
    at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.doReset(CompressingStoredFieldsReader.java:409)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.reset(CompressingStoredFieldsReader.java:394)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:573)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:601)
    at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
    at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
    at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
Caused by: java.io.IOException: Invalid argument
    at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:741)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
    ... 15 more
[2016-07-18 21:25:51,546][INFO ][cluster.routing.allocation] [node-1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[.marvel-es-1-2016.07.19][0]] ...]).
[2016-07-18 21:26:09,638][INFO ][cluster.routing.allocation] [node-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.marvel-es-1-2016.07.19][0]] ...]).

I believe I have discovered what was causing my problem. I removed marvel-agent from my VMs and all three VMs have been running for over 24hrs.

This is not causing your problem, it's only a symptom.

This is your problem: it means the Java JVM in your VirtualBox VM is not able to read large files (larger than 2GB). Check if your VM limits and capabilities are configured correctly. Maybe your JVM or some other component is not capable of 64bit I/O but that would be peculiar.

I kinda figured, but was surprised it made a difference.

I assume you are either saying either this setting is too small or there is another issue with the VM or JVM. I did verify on startup of elasticsearch it does show the below setting and it says "compressed ordinary object pointers [true]"
Found in: /etc/default/elasticsearch

As I get time this afternoon I plan to take another look at the settings, but wanted to post an update.

Updated to the latest java 8 version 101 and VirtualBox Guest Additions: 5.0.26

Stopped using the "Shared Folder" feature in VirtualBox and haven't had a problem since. I mounted a network share instead, which I should have done from the start.