Out of space issue

scaarup · April 25, 2016, 7:21am

Hi all.

I have the following in my elasticsearch conf:
path.data: /var/lib/elasticsearch/DATA2,/var/lib/elasticsearch/DATA3,/var/lib/elasticsearch/DATA1

Now I have run out of discspace on /var/lib/elasticsearch/DATA2 but I have plenty of space on /var/lib/elasticsearch/DATA3. However elasticsearch is in a state where it refuses to receive any more data and it writes the following stacktrace several times per second:

[2016-04-25 09:11:01,599][WARN ][cluster.action.shard     ] [mgp-es103] [audit-2016.16][2] received shard failed for target shard [[audit-2016.16][2], node[_8mzWslDT8yONVh5jO7-mw], [P], v[128564], s[INITIALIZING], a[id=Et5k8n6SRQO8r4ESbMAAPQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-04-25T07:11:00.733Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog.ckp -> /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog-8124360722841973525.tlog: No space left on device]; ]]], indexUUID [UJk0HgzJQXeaUgQGrqUuyA], message [failed recovery], failure [IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog.ckp -> /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog-7467623683839595706.tlog: No space left on device]; ]
[audit-2016.16][[audit-2016.16][2]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog.ckp -> /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog-7467623683839595706.tlog: No space left on device];
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:250)
	at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
	at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: [audit-2016.16][[audit-2016.16][2]] EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog.ckp -> /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog-7467623683839595706.tlog: No space left on device];
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:155)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1515)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1499)
	at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:972)
	at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
	... 5 more
Caused by: java.nio.file.FileSystemException: /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog.ckp -> /var/lib/elasticsearch/DATA2/dibs/nodes/0/indices/audit-2016.16/2/translog/translog-7467623683839595706.tlog: No space left on device
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixCopyFile.copyFile(UnixCopyFile.java:253)
	at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:581)
	at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
	at java.nio.file.Files.copy(Files.java:1274)
	at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:344)
	at org.elasticsearch.index.translog.Translog.<init>(Translog.java:179)
	at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:151)
	... 11 more

Why can't ES recover from this situation? Running version 2.3.1.

Bruce_Ritchie · April 25, 2016, 2:12pm

I can't think of how to have ES fix itself except to move the data from DATA2 to a drive with more disk space (or increase the disks allocated to DATA2). I'm surprised that it got into this situation though - I would have thought that the disk allocator would have handled this ( https://github.com/elastic/elasticsearch/issues/11271 and https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html). Did you disable that by chance?

scaarup · April 26, 2016, 6:20am

No I havn't disabled that, and I can see it's enabled by default.

Topic		Replies	Views
No space - disaster Elasticsearch	19	7618	November 4, 2022
Failed shard error on elasticsearch snapshot (No space left) Kibana snapshot-and-restore	1	274	October 11, 2021
Elasticsearch Fills Logs with Error Messages When Shard Fails to Recover Elasticsearch	3	1991	July 5, 2017
ES shard failed error Elasticsearch	2	1068	July 6, 2017
Elasticsearch says the disk is out of space, but [df -h] shows not Elasticsearch	7	4589	April 6, 2018

Out of space issue

Related topics