Unable to start Elastic Search after space issue

Hello,

We have had ES running for few months now and noticed recently the E drive on Windows was running really low on space. After adding 100GB to the E drive, tried restarting the service and server multiple times but ES does not start.
Below is the error message:

e:\Elastic\Elasticsearch\bin>elasticsearch.exe
[2017-09-29T10:34:30,120][INFO ][o.e.n.Node ] [esprod] initializing ...
[2017-09-29T10:34:30,343][INFO ][o.e.e.NodeEnvironment ] [esprod] using [1] data paths, mounts [[New Volume (E:)]], net usable_space [100.3gb], net total_space [399.9gb], spins? [unknown], types [NTFS]
[2017-09-29T10:34:30,344][INFO ][o.e.e.NodeEnvironment ] [esprod] heap size [11.8gb], compressed ordinary object pointers [true]

Could not start process within (00:02:00): C:\Program Files\Java\jdk1.8.0_141\bin\java.exe -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Xmx12288m -Xms12288m -Delasticsearch -Des.path.home="E:\Elastic\Elasticsearch" -cp "E:\Elastic\Elasticsearch\lib\elasticsearch-5.5.0.jar;E:\Elastic\Elasticsearch\lib\HdrHistogram-2.1.9.jar;E:\Elastic\Elasticsearch\lib\hppc-0.7.1.jar;E:\Elastic\Elasticsearch\lib\jackson-core-2.8.6.jar;E:\Elastic\Elasticsearch\lib\jackson-dataformat-cbor-2.8.6.jar;E:\Elastic\Elasticsearch\lib\jackson-dataformat-smile-2.8.6.jar;E:\Elastic\Elasticsearch\lib\jackson-dataformat-yaml-2.8.6.jar;E:\Elastic\Elasticsearch\lib\java-version-checker-5.5.0.jar;E:\Elastic\Elasticsearch\lib\jna-4.4.0.jar;E:\Elastic\Elasticsearch\lib\joda-time-2.9.5.jar;E:\Elastic\Elasticsearch\lib\jopt-simple-5.0.2.jar;E:\Elastic\Elasticsearch\lib\jts-1.13.jar;E:\Elastic\Elasticsearch\lib\log4j-1.2-api-2.8.2.jar;E:\Elastic\Elasticsearch\lib\log4j-api-2.8.2.jar;E:\Elastic\Elasticsearch\lib\log4j-core-2.8.2.jar;E:\Elastic\Elasticsearch\lib\lucene-analyzers-common-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-backward-codecs-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-core-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-grouping-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-highlighter-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-join-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-memory-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-misc-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-queries-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-queryparser-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-sandbox-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-spatial-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-spatial-extras-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-spatial3d-6.6.0.jar;E:\Elastic\Elasticsearch\lib\lucene-suggest-6.6.0.jar;E:\Elastic\Elasticsearch\lib\plugin-cli-5.5.0.jar;E:\Elastic\Elasticsearch\lib\securesm-1.1.jar;E:\Elastic\Elasticsearch\lib\snakeyaml-1.15.jar;E:\Elastic\Elasticsearch\lib\spatial4j-0.6.jar;E:\Elastic\Elasticsearch\lib\t-digest-3.0.jar" org.elasticsearch.bootstrap.Elasticsearch -Epath.conf="E:\Elastic\Elasticsearch\config"

I am also seeing this timeout problem. I have 2 different clusters that had crashed due to out of memory exception. Trying to restart either of them would cause this error. Cluster A didn't have any plugins and Cluster B has the searchguard plugin. It seems that Cluster A was able to start within the 2 minute timeout after deleting the node.lock file in the data folder. However, this doesn't seem to work with Cluster B. This leads me to believe that it is more likely to be an issue with how long it takes for indexes to load, as there were about 250 indexes on Cluster A and 7000 (yes, 7000) on Cluster B.

[Edit] Forgot to mention, but both clusters are Version 5.5.2 using the Windows msi installer.

7000 indices? Unless that is a quite large cluster you should read this blog post about shards and sharding.

Hi, sorry, no solution from me. Just thought I'd add that I am experiencing the same issue. The cluster was likely in some form of distress as it filled up the drive (the install drive, not the data drive) with 35Gb of log data and the ElasticSearch service stopped. Since looking at a 35GB log file is a bit troublesome, I thought I'd delete the file (to free up some space) and restart ES, so unfortunately I don't have that log. On restart, I got a carbon-copy of what you have above and no other errors (even after setting the log4j root logging level to trace), so I still don't know what is wrong.

I'm running ElasticSearch 5.5.2 as a service on a Windows Server 2012 box.

Here's the output I see:
[2017-10-05T12:50:53,064][DEBUG][o.e.b.JNAKernel32Library ] windows/Kernel32 library loaded
[2017-10-05T12:50:53,073][DEBUG][o.e.b.SystemCallFilter ] Windows ActiveProcessLimit initialization successful
[2017-10-05T12:50:53,074][DEBUG][o.e.b.JNANatives ] console ctrl handler correctly set
[2017-10-05T12:50:53,082][DEBUG][o.e.b.JarHell ] java.class.path: D:\Mms\ElasticStack\Elasticsearch\lib\elasticsearch-5.5.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\HdrHistogram-2.1.9.jar;D:\Mms\ElasticStack\Elasticsearch\lib\hppc-0.7.1.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jackson-core-2.8.6.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jackson-dataformat-cbor-2.8.6.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jackson-dataformat-smile-2.8.6.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jackson-dataformat-yaml-2.8.6.jar;D:\Mms\ElasticStack\Elasticsearch\lib\java-version-checker-5.5.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jna-4.4.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\joda-time-2.9.5.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jopt-simple-5.0.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\jts-1.13.jar;D:\Mms\ElasticStack\Elasticsearch\lib\log4j-1.2-api-2.8.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\log4j-api-2.8.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\log4j-core-2.8.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-analyzers-common-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-backward-codecs-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-core-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-grouping-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-highlighter-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-join-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-memory-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-misc-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-queries-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-queryparser-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-sandbox-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-spatial-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-spatial-extras-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-spatial3d-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\lucene-suggest-6.6.0.jar;D:\Mms\ElasticStack\Elasticsearch\lib\plugin-cli-5.5.2.jar;D:\Mms\ElasticStack\Elasticsearch\lib\securesm-1.1.jar;D:\Mms\ElasticStack\Elasticsearch\lib\snakeyaml-1.15.jar;D:\Mms\ElasticStack\Elasticsearch\lib\spatial4j-0.6.jar;D:\Mms\ElasticStack\Elasticsearch\lib\t-digest-3.0.jar
[2017-10-05T12:50:53,082][DEBUG][o.e.b.JarHell ] sun.boot.class.path: C:\Program Files\Java\jre1.8.0_144\lib\resources.jar;C:\Program Files\Java\jre1.8.0_144\lib\rt.jar;C:\Program Files\Java\jre1.8.0_144\lib\sunrsasign.jar;C:\Program Files\Java\jre1.8.0_144\lib\jsse.jar;C:\Program Files\Java\jre1.8.0_144\lib\jce.jar;C:\Program Files\Java\jre1.8.0_144\lib\charsets.jar;C:\Program Files\Java\jre1.8.0_144\lib\jfr.jar;C:\Program Files\Java\jre1.8.0_144\classes
[2017-10-05T12:50:53,083][DEBUG][o.e.b.JarHell ] classloader urls: [file:/D:/Mms/ElasticStack/Elasticsearch/lib/elasticsearch-5.5.2.jar ... (snipped for length restrictions)
ms/ElasticStack/Elasticsearch/lib/securesm-1.1.jar, file:/D:/Mms/ElasticStack/Elasticsearch/lib/snakeyaml-1.15.jar, file:/D:/Mms/ElasticStack/Elasticsearch/lib/spatial4j-0.6.jar, file:/D:/Mms/ElasticStack/Elasticsearch/lib/t-digest-3.0.jar]
[2017-10-05T12:50:53,086][DEBUG][o.e.b.JarHell ] java.home: C:\Program Files\Java\jre1.8.0_144
[2017-10-05T12:50:53,087][DEBUG][o.e.b.JarHell ] examining jar: ... (snipped for length restrictions)
[2017-10-05T12:50:53,269][INFO ][o.e.n.Node ] [NORPEWMMON1] initializing ...
[2017-10-05T12:50:53,320][TRACE][o.e.e.NodeEnvironment ] [NORPEWMMON1] obtaining node lock on \m-part.active.preprod\mms\master\Elasticsearch\Data\nodes\0 ...
[2017-10-05T12:50:53,447][TRACE][o.e.e.NodeEnvironment ] [NORPEWMMON1] found state file: [id:51, legacy:false, file:\m-part.active.preprod\mms\master\Elasticsearch\Data\nodes\0_state\node-51.st]
[2017-10-05T12:50:53,498][TRACE][o.e.e.NodeEnvironment ] [NORPEWMMON1] state id [51] read from [node-51.st]
[2017-10-05T12:50:53,588][DEBUG][o.e.e.NodeEnvironment ] [NORPEWMMON1] using node location [[NodePath{path=\m-part.active.preprod\mms\master\Elasticsearch\Data\nodes\0, spins=null}]], local_lock_id [0]
[2017-10-05T12:50:53,596][DEBUG][o.e.e.NodeEnvironment ] [NORPEWMMON1] node data locations details:
-> \m-part.active.preprod\mms\master\Elasticsearch\Data\nodes\0, free_space [29.3gb], usable_space [29.3gb], total_space [139.9gb], spins? [unknown], mount [System (\m-part.active.preprod\mms)], type [NTFS]
[2017-10-05T12:50:53,596][INFO ][o.e.e.NodeEnvironment ] [NORPEWMMON1] heap size [3.9gb], compressed ordinary object pointers [true]
[2017-10-05T12:52:50,725][DEBUG][o.e.b.JNAKernel32Library ] console control handler receives event [0@0]

(followed by a similar "Could not process within" error that is only shown on the console, not written to the log.

Please note that I do not require resolution of this. This was on a trial-instance in our PreProd environment and I have since deleted the indices and started over in order to get my demo environment back up and running. I just thought I'd add my voice to those above. FYI my trial instance was a 1-node cluster with only roughly 50 indices and no replication. It had been running fine for at least a month until this issue was encountered.

We know that 7000 is extremely unnecessary. It is an old test machine and our index cleanup had run awry and failed to delete indexes for quite a while. We don't expect much more than maybe 100. I will look into updating to latest 5.x, and see if this resolves the issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.