After server disk full, indexes unavailable


(drahmel7) #1

My CentOS server box that was hosting Elastic Server ran out of disk space. After clearing disk space and bringing up ES, all indexes are unavailable. If I do this:

curl -XGET http://localhost:9200/_status

I get: {"ok":true,"_shards":{"total":50,"successful":0,"failed":0},"indices":{}}

If I query my index: curl -XGET http://localhost:9200/cast/_status

I get: {"ok":true,"_shards":{"total":10,"successful":0,"failed":0},"indices":{}}

If I do: curl -XGET 'http://localhost:9200/_cluster/health'

I get: {"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":50}

When I look at the log I see this:

[2011-10-07 10:18:30,188][WARN ][indices.cluster ] [Mountjoy] [cast][2] failed to start shard
org.elasticsearch.index.shard.recovery.RecoveryFailedException: Index Shard [cast][2]: Recovery failed from [Mogul of the Mystic Mountain][p-w0IjpeTYqR4EbPXezzOQ][inet[/10.7.88.73:9300]] into [Mountjoy][cJoOqTimSiqAzfsyJMQlLg][inet[/10.7.88.73:9301]]
at org.elasticsearch.index.shard.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:253)
at org.elasticsearch.index.shard.recovery.RecoveryTarget.access$100(RecoveryTarget.java:71)
at org.elasticsearch.index.shard.recovery.RecoveryTarget$2.run(RecoveryTarget.java:156)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Mogul of the Mystic Mountain][inet[/10.7.88.73:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [cast][2] Phase[1] Execution failed
at org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1006)
at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:420)
at org.elasticsearch.index.shard.recovery.RecoverySource.recover(RecoverySource.java:110)
at org.elasticsearch.index.shard.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at org.elasticsearch.index.shard.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:296)
at org.elasticsearch.index.shard.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:285)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:238)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.index.shard.recovery.RecoverFilesRecoveryException: [cast][2] Failed to transfer [201] files with total size of [16.1gb]
at org.elasticsearch.index.shard.recovery.RecoverySource$1.phase1(RecoverySource.java:207)
at org.[2011-10-07 11:07:34,844][WARN ][indices.cluster ] [Mountjoy] [cast][3] failed to start shard

Is this corrupted? Is there a way to recover my index?

Dan


(johno) #2

I have almost precisely the same problem. If there is a known fix, please post it!

ElasticSearch appears to be especially sensitive to this kind of thing. I had logging going to a different filesystem, and after running out of space on the index filesystem, it proceeded to fill up a couple hundred gigs of logs on the other filesystem, dumping entire documents to the log. Since bulk indexing took twelve hours, I'd prefer to recover what I can.


(system) #3