Elasticsearch VM in VirtualBox Freezes after 15-30 minutes?

blcarlson · June 27, 2016, 10:31pm

I haven't been able to successfully troubleshoot why after 15-30 minutes one of my three VMs freezes and requires a restart. Eventually, a second VM will freeze if I wait after the first freezes. I haven't noticed anything particular in the log files for Elasticsearch that would lead me to a solution and I haven't found anything useful on Google.

My original cluster used Ubuntu Server 16.04 LTS, but when I started to experience problems I decided I would rule out the OS being the problem, since I am successfully running two other VMs on this host machine that have run for a couple of years now. The other VMs being a Ubuntu 14.04 LTS and WinServer 2012. At this point I figure either I have something miss configured or there is a bug in elasticsearch that I somehow managed to find. Any help or suggestions would be greatly appreciated.

Configuration:
3 Ubuntu 14.04 LTS virtual machines hosted in VirtualBox v5.0.22
Java version 1.8.0_91
Elasticseach version 2.3.3
Host machine has 32 gb of memory, quad-core cpu, and plenty of storage

elasticsearch.yml

cluster.name: mycluster 
node.name: mycluster-node-1
path.data: /media/sf_ElasticSearchStorage/14/data
path.logs: /media/sf_ElasticSearchStorage/14/logs
bootstrap.mlockall: true
network.host: 192.168.0.33
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.0.19", "192.168.0.29", "192.168.0.33"]
discovery.zen.minimum_master_nodes: 2
node.max_local_storage_nodes: 1

elasticsearch.service

Description=Elasticsearch
Documentation=http://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Environment=ES_HOME=/usr/share/elasticsearch
Environment=CONF_DIR=/etc/elasticsearch
Environment=DATA_DIR=/var/lib/elasticsearch
Environment=LOG_DIR=/var/log/elasticsearch
Environment=PID_DIR=/var/run/elasticsearch
EnvironmentFile=-/etc/default/elasticsearch

WorkingDirectory=/usr/share/elasticsearch

User=elasticsearch
Group=elasticsearch

ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec

ExecStart=/usr/share/elasticsearch/bin/elasticsearch \
                                                -Des.pidfile=${PID_DIR}/elasticsearch.pid \
                                                -Des.default.path.home=${ES_HOME} \
                                                -Des.default.path.logs=${LOG_DIR} \
                                                -Des.default.path.data=${DATA_DIR} \
                                                -Des.default.path.conf=${CONF_DIR}

StandardOutput=journal
StandardError=inherit

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=131070

# Specifies the maximum number of bytes of memory that may be locked into RAM
# Set to "infinity" if you use the 'bootstrap.mlockall: true' option
# in elasticsearch.yml and 'MAX_LOCKED_MEMORY=unlimited' in /etc/default/elasticsearch
LimitMEMLOCK=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0

# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM

# Java process is never killed
SendSIGKILL=no

# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

/etc/default/elasticsearch

# Elasticsearch configuration directory
CONF_DIR=/etc/elasticsearch

# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=2g

# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
MAX_OPEN_FILES=131070

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.mlockall: true' option
# in elasticsearch.yml (ES_HEAP_SIZE  must also be set).
# When using Systemd, the LimitMEMLOCK property must be set
# in /usr/lib/systemd/system/elasticsearch.service
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
MAX_MAP_COUNT=262144

warkolm · June 29, 2016, 6:52am

It's not a storage issue is it?

blcarlson · June 29, 2016, 11:40pm

I don't believe so, since the drive that contains path.data and path.logs has around 160gb of free space. The drive that has the VM images on it has around ~20gb free. Although I am using VirtualBox's dynamic drive size instead of a fixed drive size if someone thinks that could be an issue or has had problems with that in the past?

I notice this evening that VIrtualBox released a new version 5.0.24, so I'll install that tonight and see if that changes anything.

blcarlson · July 20, 2016, 12:54am

All three are fully patched 14.04 LTS and I verified that all including where the data and log information is stored have more than enough space for the next future or more.

Upgraded to:
VirtualBox 5.0.24
Elasticsearch 2.3.4

Changed my VirtualBox VMs from dynamic to a static size of 30gb each.
Changed discovery.zen.minimum_master_nodes from 2 to 1.

I still cannot get the VMs to keep from freezing, however it will run for almost an hour now before one of the three VMs freezes. Although eventually all three will freeze and have to be restarted. I'm to the point where I'm not sure what else to try.

blcarlson · July 20, 2016, 12:59am

Attached below are the last few lines from the log files for elasticsearch.

VM 1:

NodeDisconnectedException[[node-3][192.168.0.33:9300][indices:data/write/bulk[s][r]] disconnected]
[2016-07-19 12:01:40,219][DEBUG][action.admin.indices.stats] [node-2] failed to execute [indices:monitor/stats] on node [yxAtaU5CTyWvjKijIA2Wmw]
NodeDisconnectedException[[node-3][192.168.0.33:9300][indices:monitor/stats[n]] disconnected]
[2016-07-19 12:01:40,222][DEBUG][action.admin.cluster.node.stats] [node-2] failed to execute on node [yxAtaU5CTyWvjKijIA2Wmw]
NodeDisconnectedException[[node-3][192.168.0.33:9300][cluster:monitor/nodes/stats[n]] disconnected]

VM 2

[2016-07-19 00:01:32,294][WARN ][transport.netty          ] [node-3] exception caught on transport layer [[id: 0xc83155bb, /192.168.0.19:46634 => /192.168.0.33:9300]], closing connection
java.io.IOException: No route to host
  .....

VM 3

[2016-07-18 21:25:51,543][WARN ][cluster.action.shard     ] [node-1] [.marvel-es-1-2016.07.19][0] received shard failed for target shard [[.marvel-es-1-2016.07.19][0], node[HfPYw5rxRhyf7hqVHEr1LQ], [P], v[7], s[STARTED], a[id=S9W8JdQ0Rb-NkC4ZxxCL7w]], indexUUID [Ny-N-WJ7SZiPERBeOTf3rg], message [engine failure, reason [merge failed]], failure [MergeException[java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")]; nested: IOException[Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")]; nested: IOException[Invalid argument]; ]
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")
    at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1241)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid argument: NIOFSIndexInput(path="/media/sf_ElasticSearchStorage/14p/data/mimir/nodes/0/indices/.marvel-es-1-2016.07.19/0/index/_n2.fdt")
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:189)
    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
    at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.doReset(CompressingStoredFieldsReader.java:409)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.reset(CompressingStoredFieldsReader.java:394)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:573)
    at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:601)
    at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
    at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
    at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
    at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
    at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
Caused by: java.io.IOException: Invalid argument
    at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
    at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:741)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
    ... 15 more
[2016-07-18 21:25:51,546][INFO ][cluster.routing.allocation] [node-1] Cluster health status changed from [GREEN] to [YELLOW] (reason: [shards failed [[.marvel-es-1-2016.07.19][0]] ...]).
[2016-07-18 21:26:09,638][INFO ][cluster.routing.allocation] [node-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.marvel-es-1-2016.07.19][0]] ...]).

blcarlson · July 23, 2016, 3:58pm

I believe I have discovered what was causing my problem. I removed marvel-agent from my VMs and all three VMs have been running for over 24hrs.

jprante · July 23, 2016, 7:52pm

This is not causing your problem, it's only a symptom.

This is your problem: it means the Java JVM in your VirtualBox VM is not able to read large files (larger than 2GB). Check if your VM limits and capabilities are configured correctly. Maybe your JVM or some other component is not capable of 64bit I/O but that would be peculiar.

blcarlson · July 24, 2016, 5:36pm

I kinda figured, but was surprised it made a difference.

I assume you are either saying either this setting is too small or there is another issue with the VM or JVM. I did verify on startup of elasticsearch it does show the below setting and it says "compressed ordinary object pointers [true]"
Found in: /etc/default/elasticsearch
ES_HEAP_SIZE=2g

As I get time this afternoon I plan to take another look at the settings, but wanted to post an update.

Updated to the latest java 8 version 101 and VirtualBox Guest Additions: 5.0.26

blcarlson · August 4, 2016, 11:22pm

Stopped using the "Shared Folder" feature in VirtualBox and haven't had a problem since. I mounted a network share instead, which I should have done from the start.

Topic		Replies	Views
Instant crash on startup Elasticsearch	15	5107	July 6, 2017
ElasticSearch crashes OS? Elasticsearch	13	475	July 6, 2017
What is the best configuration to run on linux VPS server without crashing? Elasticsearch	5	1509	July 6, 2017
Sudden high "OS Load", then ES VM disappears Elasticsearch	10	694	July 6, 2017
EC2 instance hanging after a few hours Elasticsearch	6	1338	July 6, 2017

Elasticsearch VM in VirtualBox Freezes after 15-30 minutes?

Related topics