No space - disaster

scaarup · November 9, 2015, 3:01pm

Running out of space on the elasticsearch-server is a total disaster. After freeing up some space, I am trying to start ES again, but the following error continues to appear in the log:

[2015-11-09 14:56:38,021][WARN ][cluster.action.shard     ] [Terrax the Tamer] [fe-2015.11.09][2] received shard failed for [fe-2015.11.09][2], node[-Xyzs56LQIqGOqbYDns1eQ], [P], v[7], s[INITIALIZING], a[id=wD4WjCX0TyiLpgV-PUzdnQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-11-09T13:56:38.003Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp]; ]], indexUUID [2cReJ5KXR2yaywjvM7SO0g], message [failed recovery], failure [IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp]; ]
[fe-2015.11.09][[fe-2015.11.09][2]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp];
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:258)
	at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:60)
	at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: [fe-2015.11.09][[fe-2015.11.09][2]] EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp];
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:135)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1349)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1344)
	at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:889)
	at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:866)
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:249)
	... 5 more
Caused by: java.nio.file.FileAlreadyExistsException: /var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp
	at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:551)
	at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
	at java.nio.file.Files.copy(Files.java:1274)
	at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:304)
	at org.elasticsearch.index.translog.Translog.<init>(Translog.java:166)
	at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:188)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:131)
	... 11 more

It actually appear so many times, that the ES logfile is more than 10GB, causing the server to be out of space again...

And something strange happens in todays index while ES is stacktracing.
35G d2-2015.11.09
Currently 35GB and normally a day is about 8GB. Seems like some sort of an insane loop situation...

How do I recover from this?

s1monw · November 9, 2015, 6:44pm

hey this seems like a bug to me and I think I see what the reason is. Can I ask you to:

open an issue for this here: https://github.com/elastic/elasticsearch/issues/
link this entry here in the issue
list all files in /var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog
try to start that node again with logging for index.translog: DEBUG and index.engine: DEBUG and paste the output on the issue

thanks so much for reporting this.

s1monw · November 11, 2015, 9:14am

@scaarup did you open that issue or should I do it? I can but I'd still appreciate the requested information

scaarup · November 11, 2015, 10:18am

Hi @s1monw.

No I didn't because I had to recover and I therefore deleted all my indices. Meaning I cannot provide you with what you requested.

Sorry about that.

s1monw · November 11, 2015, 8:04pm

bummer! would always be good to safe the data so we can reproduce this. I think I found the issue but I can't be 100% sure now https://github.com/elastic/elasticsearch/pull/14695

alphan · November 16, 2015, 4:28pm

I am not sure if this is still helpful. I was trying out ES and encountered the same issue. You can find the part of the zipped log and the directory tree here : https://www.dropbox.com/s/gze0sssiv67fh1e/issue_14695_logs.tar.gz?dl=0

Idarlington · December 9, 2015, 10:23am

Hello, I am currently experiencing a similar problem is there a solution to this?

s1monw · December 9, 2015, 10:42am

can you provide all information from logs what kind of problem you are having as well as which version you are running?

s1monw · December 9, 2015, 10:54am

another question is did you have any issues with running out of disk space?

bharath · December 9, 2015, 11:14am

i am also having the same issue

more details regarding mem i still have 80%
JVM another 60%

"os":{"timestamp":1449649876869,"load_average":0.04,"mem":{"total_in_bytes":33535873024,"free_in_bytes":26795671552,"used_in_bytes":6740201472,"free_percent":80,"used_percent":20},"swap":{"total_in_bytes":0,"free_in_bytes":0,"used_in_bytes":0}},"process":{"timestamp":1449649876869,"open_file_descriptors":218,"max_file_descriptors":4096,"cpu":{"percent":2,"total_in_millis":16846170},"mem":{"total_virtual_in_bytes":4826849280}},"jvm":{"timestamp":1449649876869,"uptime_in_millis":170503757,"mem":{"heap_used_in_bytes":434470248,"heap_used_percent":41,"heap_committed_in_bytes":649949184,"heap_max_in_bytes":1037959168,"non_heap_used_in_bytes":62325864,"non_heap_committed_in_bytes":92753920,"pools":{"young":{"used_in_bytes":40399312,"max_in_bytes":286326784,"peak_used_in_bytes":71696384,"peak_max_in_bytes":286326784},"survivor":{"used_in_bytes":8288368,"max_in_bytes":35782656,"peak_used_in_bytes":8912896,"peak_max_in_bytes":35782656},"old":{"used_in_bytes":385782568,"max_in_bytes":715849728,"peak_used_in_bytes":431855616,"peak_max_in_bytes":715849728}}},

Idarlington · December 9, 2015, 12:13pm

yes I did

Idarlington · December 9, 2015, 12:15pm

I couldn't provide the log files because I deleted them, they were quite large : upto 35G.

I deleted some translog files in some nodes and rebooted, it seems to have solved the issue.

I hope no problem arises.

s1monw · December 9, 2015, 12:36pm

this issue has been fixed in 2.1 and 2.0.1 so please upgrade.

Idarlington · December 9, 2015, 12:45pm

Ok, we are Currently using 2.0, I would like to know what makes the log files become so large.
it led to running out of disk space.

s1monw · December 9, 2015, 12:46pm

I can't tell - would need to look at it's content which is tricky. Can you maybe look if you see something? I suspect some guice errors

raghvendra · June 29, 2017, 6:33am

Hey @s1monw , I have got the same issue in es 5.4.0. When i'm trying to start the elasticsearch via crontab.. it's giving these logs .

2017-06-29T11:58:02,106][INFO ][o.e.n.Node ] [master] initializing ...
[2017-06-29T11:58:02,218][INFO ][o.e.e.NodeEnvironment ] [master] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [725.7gb], net total_space [908.4gb], spins? [possibly], types [ext4]
[2017-06-29T11:58:02,219][INFO ][o.e.e.NodeEnvironment ] [master] heap size [3.9gb], compressed ordinary object pointers [true]
[2017-06-29T11:58:02,254][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [master] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to created node environment
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.4.0.jar:5.4.0]
Caused by: java.lang.IllegalStateException: Failed to created node environment
at org.elasticsearch.node.Node.(Node.java:265) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]
... 6 more
Caused by: java.io.IOException: failed to write in data directory [/home/mywavia/Downloads/elasticsearch-5.4.0/data/nodes/0/indices/A1JkmYVoSpGRNrJmATzwRg/_state] write permission is required
at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1075) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1047) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:277) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:262) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]
... 6 more
Caused by: java.nio.file.FileAlreadyExistsException: /home/mywavia/Downloads/elasticsearch-5.4.0/data/nodes/0/indices/A1JkmYVoSpGRNrJmATzwRg/_state/.es_temp_file
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_131]
at java.nio.file.Files.createFile(Files.java:632) ~[?:1.8.0_131]
at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1072) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1047) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:277) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:262) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]

Can you help me with this ?

raghvendra · June 29, 2017, 7:06am

Hey .. when I went to that directory , and opened that file , it was empty . So , without having a second thought , i deleted that file and restarted the node , now every thing is working fine .
But what I have done , was something , I was supposed to do , or there is any other better options that could have been tried ??

s1monw · June 29, 2017, 7:29am

Hey .. when I went to that directory , and opened that file , it was empty . So , without having a second thought , i deleted that file and restarted the node , now every thing is working fine .
But what I have done , was something , I was supposed to do , or there is any other better options that could have been tried ??

we try to write a file in the data directory to ensure we have the correct permission. Yet if something fails during the node startup we might leave a file behind. There is a pull request open where I have concerns. Node should start up despite of a lingering `.es_temp_file` by bleskes · Pull Request #21210 · elastic/elasticsearch · GitHub deleting the file was the right thing to resolve the condition, do you know how you got into this? did you kill the node while it was starting?

raghvendra · June 29, 2017, 7:59am

Yes .. actually , this get started on startup via crontab scheduler so every time when I start my node , ES also starts , but yesterday , I didn't want to start ES , so I had killed all java process by

sudo killall -2 java .

May be that's how I got into this error

Topic		Replies	Views
Total dataloss due to disk space issues Elasticsearch	8	470	July 6, 2017
Elasticsearch being unstable Elasticsearch	4	1499	July 6, 2017
ES failed to recover after crash Elasticsearch	9	3602	July 6, 2017
NoSuchFileException of the "translog.ckp" file Elasticsearch	15	2090	January 27, 2020
Out of space issue Elasticsearch	3	4393	July 5, 2017

No space - disaster

Related topics