No space - disaster


#1

Running out of space on the elasticsearch-server is a total disaster. After freeing up some space, I am trying to start ES again, but the following error continues to appear in the log:

[2015-11-09 14:56:38,021][WARN ][cluster.action.shard     ] [Terrax the Tamer] [fe-2015.11.09][2] received shard failed for [fe-2015.11.09][2], node[-Xyzs56LQIqGOqbYDns1eQ], [P], v[7], s[INITIALIZING], a[id=wD4WjCX0TyiLpgV-PUzdnQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-11-09T13:56:38.003Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp]; ]], indexUUID [2cReJ5KXR2yaywjvM7SO0g], message [failed recovery], failure [IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp]; ]
[fe-2015.11.09][[fe-2015.11.09][2]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp];
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:258)
	at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:60)
	at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: [fe-2015.11.09][[fe-2015.11.09][2]] EngineCreationFailureException[failed to create engine]; nested: FileAlreadyExistsException[/var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp];
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:135)
	at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
	at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1349)
	at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1344)
	at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:889)
	at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:866)
	at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:249)
	... 5 more
Caused by: java.nio.file.FileAlreadyExistsException: /var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog/translog-13.ckp
	at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:551)
	at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
	at java.nio.file.Files.copy(Files.java:1274)
	at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:304)
	at org.elasticsearch.index.translog.Translog.<init>(Translog.java:166)
	at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:188)
	at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:131)
	... 11 more

It actually appear so many times, that the ES logfile is more than 10GB, causing the server to be out of space again...

And something strange happens in todays index while ES is stacktracing.
35G d2-2015.11.09
Currently 35GB and normally a day is about 8GB. Seems like some sort of an insane loop situation...

How do I recover from this?


(Simon Willnauer) #2

hey this seems like a bug to me and I think I see what the reason is. Can I ask you to:

  • open an issue for this here: https://github.com/elastic/elasticsearch/issues/
  • link this entry here in the issue
  • list all files in /var/lib/elasticsearch/elasticsearch/nodes/0/indices/fe-2015.11.09/2/translog
  • try to start that node again with logging for index.translog: DEBUG and index.engine: DEBUG and paste the output on the issue

thanks so much for reporting this.


(Simon Willnauer) #3

@scaarup did you open that issue or should I do it? I can but I'd still appreciate the requested information


#4

Hi @s1monw.

No I didn't because I had to recover and I therefore deleted all my indices. Meaning I cannot provide you with what you requested.

Sorry about that.


(Simon Willnauer) #5

bummer! would always be good to safe the data so we can reproduce this. I think I found the issue but I can't be 100% sure now https://github.com/elastic/elasticsearch/pull/14695


(Alphan Es) #6

I am not sure if this is still helpful. I was trying out ES and encountered the same issue. You can find the part of the zipped log and the directory tree here : https://www.dropbox.com/s/gze0sssiv67fh1e/issue_14695_logs.tar.gz?dl=0


(Ikenna Darlington) #7

Hello, I am currently experiencing a similar problem is there a solution to this?


(Simon Willnauer) #8

can you provide all information from logs what kind of problem you are having as well as which version you are running?


(Simon Willnauer) #9

another question is did you have any issues with running out of disk space?


(bharath) #10

i am also having the same issue

more details regarding mem i still have 80%
JVM another 60%

"os":{"timestamp":1449649876869,"load_average":0.04,"mem":{"total_in_bytes":33535873024,"free_in_bytes":26795671552,"used_in_bytes":6740201472,"free_percent":80,"used_percent":20},"swap":{"total_in_bytes":0,"free_in_bytes":0,"used_in_bytes":0}},"process":{"timestamp":1449649876869,"open_file_descriptors":218,"max_file_descriptors":4096,"cpu":{"percent":2,"total_in_millis":16846170},"mem":{"total_virtual_in_bytes":4826849280}},"jvm":{"timestamp":1449649876869,"uptime_in_millis":170503757,"mem":{"heap_used_in_bytes":434470248,"heap_used_percent":41,"heap_committed_in_bytes":649949184,"heap_max_in_bytes":1037959168,"non_heap_used_in_bytes":62325864,"non_heap_committed_in_bytes":92753920,"pools":{"young":{"used_in_bytes":40399312,"max_in_bytes":286326784,"peak_used_in_bytes":71696384,"peak_max_in_bytes":286326784},"survivor":{"used_in_bytes":8288368,"max_in_bytes":35782656,"peak_used_in_bytes":8912896,"peak_max_in_bytes":35782656},"old":{"used_in_bytes":385782568,"max_in_bytes":715849728,"peak_used_in_bytes":431855616,"peak_max_in_bytes":715849728}}},


(Ikenna Darlington) #11

yes I did


(Ikenna Darlington) #12

I couldn't provide the log files because I deleted them, they were quite large : upto 35G.

I deleted some translog files in some nodes and rebooted, it seems to have solved the issue.

I hope no problem arises.


(Simon Willnauer) #13

this issue has been fixed in 2.1 and 2.0.1 so please upgrade.


(Ikenna Darlington) #14

Ok, we are Currently using 2.0, I would like to know what makes the log files become so large.
it led to running out of disk space.


(Simon Willnauer) #15

I can't tell - would need to look at it's content which is tricky. Can you maybe look if you see something? I suspect some guice errors :frowning:


(Raghvendra Singh) #16

Hey @s1monw , I have got the same issue in es 5.4.0. When i'm trying to start the elasticsearch via crontab.. it's giving these logs .

2017-06-29T11:58:02,106][INFO ][o.e.n.Node ] [master] initializing ...
[2017-06-29T11:58:02,218][INFO ][o.e.e.NodeEnvironment ] [master] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [725.7gb], net total_space [908.4gb], spins? [possibly], types [ext4]
[2017-06-29T11:58:02,219][INFO ][o.e.e.NodeEnvironment ] [master] heap size [3.9gb], compressed ordinary object pointers [true]
[2017-06-29T11:58:02,254][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [master] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to created node environment
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.4.0.jar:5.4.0]
Caused by: java.lang.IllegalStateException: Failed to created node environment
at org.elasticsearch.node.Node.(Node.java:265) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]
... 6 more
Caused by: java.io.IOException: failed to write in data directory [/home/mywavia/Downloads/elasticsearch-5.4.0/data/nodes/0/indices/A1JkmYVoSpGRNrJmATzwRg/_state] write permission is required
at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1075) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1047) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:277) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:262) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]
... 6 more
Caused by: java.nio.file.FileAlreadyExistsException: /home/mywavia/Downloads/elasticsearch-5.4.0/data/nodes/0/indices/A1JkmYVoSpGRNrJmATzwRg/_state/.es_temp_file
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_131]
at java.nio.file.Files.createFile(Files.java:632) ~[?:1.8.0_131]
at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1072) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1047) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:277) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:262) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.node.Node.(Node.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.4.0.jar:5.4.0]

Can you help me with this ?


(Raghvendra Singh) #17

Hey .. when I went to that directory , and opened that file , it was empty . So , without having a second thought , i deleted that file and restarted the node , now every thing is working fine .
But what I have done , was something , I was supposed to do , or there is any other better options that could have been tried ??


(Simon Willnauer) #18

Hey .. when I went to that directory , and opened that file , it was empty . So , without having a second thought , i deleted that file and restarted the node , now every thing is working fine .
But what I have done , was something , I was supposed to do , or there is any other better options that could have been tried ??

we try to write a file in the data directory to ensure we have the correct permission. Yet if something fails during the node startup we might leave a file behind. There is a pull request open where I have concerns. https://github.com/elastic/elasticsearch/pull/21210 deleting the file was the right thing to resolve the condition, do you know how you got into this? did you kill the node while it was starting?


(Raghvendra Singh) #19

Yes .. actually , this get started on startup via crontab scheduler so every time when I start my node , ES also starts , but yesterday , I didn't want to start ES , so I had killed all java process by

sudo killall -2 java .

May be that's how I got into this error