We had a problem over the weekend where too much local disk space caused errors in elasticsearch such as:
org.elasticsearch.index.translog.TranslogException: [index][3] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@3583a4bc]
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:181)
at org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:361)
at org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:266) at org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:272)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:191) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:418)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$100(TransportShardReplicationOperationAction.java:233)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:331)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(FileDispatcher.java:45) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:648)
After a series of these, the server spontaneously restarted and tried to rebuild the indices. This also failed due to probable corruption in the transaction log or the persistent disk space problem.
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [index][1] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:164)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:144)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [message.text]
at org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:300)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectMapper.java:419)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:323)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:344)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:313)
at org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:451)
at org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:380)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:258)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:518)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:159)
... 4 more
Caused by: org.elasticsearch.common.jackson.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in string value
at [Source: [B@7b0906da; line: 1, column: 1000]
at org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:346)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1464)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:40)
at org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:287)
... 13 more
after a number of these messages, there were additional errors:
[2011-07-23 23:11:28,818][WARN ][cluster.action.shard ] [Screech] sending failed shard for [index][2], node[iS66iEEyTu2Vw_flvz2AyQ], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index][2] failed to recover shard]; nested: MapperParsingException[Failed to parse [message.text]]; nested: JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in string value
at [Source: [B@5cdc6180; line: 1, column: 42]]; ]]
eventually the pre-exiting shards were all deleted and new ones created by the server resulting in total data loss.
[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech] [index][3] state: [CREATED]->[RECOVERING], reason [from gateway]
[2011-07-23 23:11:34,601][DEBUG][indices.cluster ] [Screech] [index][1] cleaning shard locally (not allocated)
[2011-07-23 23:11:34,601][DEBUG][index.service ] [Screech] [index] deleting shard_id [1]
[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech] [index][1] state: [RECOVERING]->[CLOSED], reason [cleaning shard locally (not allocated)]
[2011-07-23 23:11:34,630][DEBUG][monitor.jvm ] [Screech] [gc][ParNew][17] took [157ms]/[765ms], reclaimed [14mb], leaving [167.5mb] used, max [4.1gb]
[2011-07-23 23:11:34,702][DEBUG][index.gateway ] [Screech] [index][3] starting recovery from local ...
[2011-07-23 23:11:34,704][DEBUG][index.engine.robin ] [Screech] [index][3] Starting engine
We have a backup of the data, and are planning to move to a 4 node cluster, but just curious how this could have been prevented. Are there any policy settings in the config to force the server to shutdown instead of deleting indices?
Thanks,
-Erik