Total dataloss due to disk space issues


(erik.seilnacht@mulesoft.com) #1

We had a problem over the weekend where too much local disk space caused errors in elasticsearch such as:

org.elasticsearch.index.translog.TranslogException: [index][3] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@3583a4bc]
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:181)
at org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:361)
at org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:266) at org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:272)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:191) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:418)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$100(TransportShardReplicationOperationAction.java:233)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:331)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(FileDispatcher.java:45) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:648)

After a series of these, the server spontaneously restarted and tried to rebuild the indices. This also failed due to probable corruption in the transaction log or the persistent disk space problem.

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [index][1] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:164)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:144)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [message.text]
at org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:300)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectMapper.java:419)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:323)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:344)
at org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:313)
at org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:451)
at org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:380)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:258)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:518)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:159)
... 4 more
Caused by: org.elasticsearch.common.jackson.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in string value
at [Source: [B@7b0906da; line: 1, column: 1000]
at org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:346)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1464)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:40)
at org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:287)
... 13 more

after a number of these messages, there were additional errors:

[2011-07-23 23:11:28,818][WARN ][cluster.action.shard ] [Screech] sending failed shard for [index][2], node[iS66iEEyTu2Vw_flvz2AyQ], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index][2] failed to recover shard]; nested: MapperParsingException[Failed to parse [message.text]]; nested: JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash to be included in string value
at [Source: [B@5cdc6180; line: 1, column: 42]]; ]]

eventually the pre-exiting shards were all deleted and new ones created by the server resulting in total data loss.

[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech] [index][3] state: [CREATED]->[RECOVERING], reason [from gateway]
[2011-07-23 23:11:34,601][DEBUG][indices.cluster ] [Screech] [index][1] cleaning shard locally (not allocated)
[2011-07-23 23:11:34,601][DEBUG][index.service ] [Screech] [index] deleting shard_id [1]
[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech] [index][1] state: [RECOVERING]->[CLOSED], reason [cleaning shard locally (not allocated)]
[2011-07-23 23:11:34,630][DEBUG][monitor.jvm ] [Screech] [gc][ParNew][17] took [157ms]/[765ms], reclaimed [14mb], leaving [167.5mb] used, max [4.1gb]
[2011-07-23 23:11:34,702][DEBUG][index.gateway ] [Screech] [index][3] starting recovery from local ...
[2011-07-23 23:11:34,704][DEBUG][index.engine.robin ] [Screech] [index][3] Starting engine

We have a backup of the data, and are planning to move to a 4 node cluster, but just curious how this could have been prevented. Are there any policy settings in the config to force the server to shutdown instead of deleting indices?

Thanks,

-Erik


(ppearcy) #2

Curious, what version of ES are you running?

I know that there were some bugs in a previous version of Lucene (I
think 3.0.0 and I think this is the issue that was tracking
https://issues.apache.org/jira/browse/LUCENE-2811) where we hit
similar.

On Jul 25, 4:45 pm, "erik.seilna...@mulesoft.com"
erik.seilna...@mulesoft.com wrote:

We had a problem over the weekend where too much local disk space caused
errors in elasticsearch such as:

org.elasticsearch.index.translog.TranslogException: [index][3] Failed to
write operation [org.elasticsearch.index.translog.Translog$Create@3583a4bc]
at
org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:181)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.ja va:361)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:26 6)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalInd exShard.java:272)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary (TransportIndexAction.java:191)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplica tionOperationAction.java:418)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$AsyncShardOperationAction.access$100(TransportShardReplicationOp erationAction.java:233)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperati onAction.java:331)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(FileDispatcher.java:45)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:75) at
sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:648)

After a series of these, the server spontaneously restarted and tried to
rebuild the indices. This also failed due to probable corruption in the
transaction log or the persistent disk space problem.

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[index][1] failed to recover shard
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalI ndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGa tewayService.java:144)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [message.text]
at
org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractF ieldMapper.java:300)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:419)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:323)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:344)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:313)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:451)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:380)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(Inte rnalIndexShard.java:258)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOpe ration(InternalIndexShard.java:518)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalI ndexShardGateway.java:159)
... 4 more
Caused by: org.elasticsearch.common.jackson.JsonParseException: Illegal
unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash
to be included in string value
at [Source: [B@7b0906da; line: 1, column: 1000]
at
org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java :1291)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(Js onParserMinimalBase.java:385)
at
org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._throwUnquotedS pace(JsonParserMinimalBase.java:346)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8S treamParser.java:1464)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8St reamParser.java:1394)
at
org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamPa rser.java:113)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContent Parser.java:74)
at
org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull (AbstractXContentParser.java:99)
at
org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField( StringFieldMapper.java:163)
at
org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField( StringFieldMapper.java:40)
at
org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractF ieldMapper.java:287)
... 13 more

after a number of these messages, there were additional errors:

[2011-07-23 23:11:28,818][WARN ][cluster.action.shard ] [Screech]
sending failed shard for [index][2], node[iS66iEEyTu2Vw_flvz2AyQ], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[index][2] failed to recover shard];
nested: MapperParsingException[Failed to parse [message.text]]; nested:
JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to
be escaped using backslash to be included in string value
at [Source: [B@5cdc6180; line: 1, column: 42]]; ]]

eventually the pre-exiting shards were all deleted and new ones created by
the server resulting in total data loss.

[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech]
[index][3] state: [CREATED]->[RECOVERING], reason [from gateway]
[2011-07-23 23:11:34,601][DEBUG][indices.cluster ] [Screech]
[index][1] cleaning shard locally (not allocated)
[2011-07-23 23:11:34,601][DEBUG][index.service ] [Screech]
[index] deleting shard_id [1]

[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech]
[index][1] state: [RECOVERING]->[CLOSED], reason [cleaning shard locally
(not allocated)]
[2011-07-23 23:11:34,630][DEBUG][monitor.jvm ] [Screech]
[gc][ParNew][17] took [157ms]/[765ms], reclaimed [14mb], leaving [167.5mb]
used, max [4.1gb]
[2011-07-23 23:11:34,702][DEBUG][index.gateway ] [Screech]
[index][3] starting recovery from local ...
[2011-07-23 23:11:34,704][DEBUG][index.engine.robin ] [Screech]
[index][3] Starting engine

We have a backup of the data, and are planning to move to a 4 node cluster,
but just curious how this could have been prevented. Are there any policy
settings in the config to force the server to shutdown instead of deleting
indices?

Thanks,

-Erik

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/total-dataloss-due-to...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(erik.seilnacht@mulesoft.com) #3

We are running Elasticsearch 0.16.2 and Lucene 3.1.0.

Looks like the Lucene bug was fixed in 3.1 ?

Thanks,

-Erik


(ppearcy) #4

Yeah, 0.16.2 wouldn't have the issue I am referring to. Must be
something different.

On Jul 25, 8:20 pm, "erik.seilna...@mulesoft.com"
erik.seilna...@mulesoft.com wrote:

We are running Elasticsearch 0.16.2 and Lucene 3.1.0.

Looks like the Lucene bug was fixed in 3.1 ?

Thanks,

-Erik

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/total-dataloss-due-to...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Shay Banon) #5

How many nodes were you running?

On Tue, Jul 26, 2011 at 1:45 AM, erik.seilnacht@mulesoft.com <
erik.seilnacht@mulesoft.com> wrote:

We had a problem over the weekend where too much local disk space caused
errors in elasticsearch such as:

org.elasticsearch.index.translog.TranslogException: [index][3] Failed to
write operation [org.elasticsearch.index.translog.Translog$Create@3583a4bc
]
at
org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:181)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:361)
at

org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:266)
at

org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:272)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:191)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:418)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$100(TransportShardReplicationOperationAction.java:233)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:331)
at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcher.pwrite0(Native Method)
at sun.nio.ch.FileDispatcher.pwrite(FileDispatcher.java:45)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100)
at sun.nio.ch.IOUtil.write(IOUtil.java:75) at
sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:648)

After a series of these, the server spontaneously restarted and tried to
rebuild the indices. This also failed due to probable corruption in the
transaction log or the persistent disk space problem.

org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[index][1] failed to recover shard
at

org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:164)
at

org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:144)
at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Failed to
parse [message.text]
at

org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:300)
at

org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectMapper.java:419)
at

org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:323)
at

org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:344)
at

org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:313)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:451)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:380)
at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareCreate(InternalIndexShard.java:258)
at

org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:518)
at

org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:159)
... 4 more
Caused by: org.elasticsearch.common.jackson.JsonParseException: Illegal
unquoted character ((CTRL-CHAR, code 0)): has to be escaped using backslash
to be included in string value
at [Source: [B@7b0906da; line: 1, column: 1000]
at

org.elasticsearch.common.jackson.JsonParser._constructError(JsonParser.java:1291)
at

org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at

org.elasticsearch.common.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:346)
at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString2(Utf8StreamParser.java:1464)
at

org.elasticsearch.common.jackson.impl.Utf8StreamParser._finishString(Utf8StreamParser.java:1394)
at

org.elasticsearch.common.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:113)
at

org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:74)
at

org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:99)
at

org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:163)
at

org.elasticsearch.index.mapper.xcontent.StringFieldMapper.parseCreateField(StringFieldMapper.java:40)
at

org.elasticsearch.index.mapper.xcontent.AbstractFieldMapper.parse(AbstractFieldMapper.java:287)
... 13 more

after a number of these messages, there were additional errors:

[2011-07-23 23:11:28,818][WARN ][cluster.action.shard ] [Screech]
sending failed shard for [index][2], node[iS66iEEyTu2Vw_flvz2AyQ], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[index][2] failed to recover shard];
nested: MapperParsingException[Failed to parse [message.text]]; nested:
JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 0)): has to
be escaped using backslash to be included in string value
at [Source: [B@5cdc6180; line: 1, column: 42]]; ]]

eventually the pre-exiting shards were all deleted and new ones created by
the server resulting in total data loss.

[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech]
[index][3] state: [CREATED]->[RECOVERING], reason [from gateway]
[2011-07-23 23:11:34,601][DEBUG][indices.cluster ] [Screech]
[index][1] cleaning shard locally (not allocated)
[2011-07-23 23:11:34,601][DEBUG][index.service ] [Screech]
[index] deleting shard_id [1]

[2011-07-23 23:11:34,601][DEBUG][index.shard.service ] [Screech]
[index][1] state: [RECOVERING]->[CLOSED], reason [cleaning shard locally
(not allocated)]
[2011-07-23 23:11:34,630][DEBUG][monitor.jvm ] [Screech]
[gc][ParNew][17] took [157ms]/[765ms], reclaimed [14mb], leaving [167.5mb]
used, max [4.1gb]
[2011-07-23 23:11:34,702][DEBUG][index.gateway ] [Screech]
[index][3] starting recovery from local ...
[2011-07-23 23:11:34,704][DEBUG][index.engine.robin ] [Screech]
[index][3] Starting engine

We have a backup of the data, and are planning to move to a 4 node cluster,
but just curious how this could have been prevented. Are there any policy
settings in the config to force the server to shutdown instead of deleting
indices?

Thanks,

-Erik

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/total-dataloss-due-to-disk-space-issues-tp3198951p3198951.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(erik.seilnacht@mulesoft.com) #6

Only one node at the moment. We are in the process of setting up a 4 node cluster however.


(Shay Banon) #7

Yea, then that explains it, it seems like the metadata file (indices create,
mappings, so on) elasticsearch wrote (and fsync'ed) to disk got corrupted.
If you had more nodes, then another metadata file would have been read.

That said, I do want to improve that logic even for one node (to be more
resistant to corruption) by possibly storing several historic metadata
files, and resorting to an older one if it fails.

-shay.banon

On Tue, Jul 26, 2011 at 6:34 AM, erik.seilnacht@mulesoft.com <
erik.seilnacht@mulesoft.com> wrote:

Only one node at the moment. We are in the process of setting up a 4 node
cluster however.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/total-dataloss-due-to-disk-space-issues-tp3198951p3199374.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Erik Seilnacht) #8

got it. thanks for the explanation.

-Erik

On Mon, Jul 25, 2011 at 9:06 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yea, then that explains it, it seems like the metadata file (indices
create, mappings, so on) elasticsearch wrote (and fsync'ed) to disk got
corrupted. If you had more nodes, then another metadata file would have been
read.

That said, I do want to improve that logic even for one node (to be more
resistant to corruption) by possibly storing several historic metadata
files, and resorting to an older one if it fails.

-shay.banon

On Tue, Jul 26, 2011 at 6:34 AM, erik.seilnacht@mulesoft.com <
erik.seilnacht@mulesoft.com> wrote:

Only one node at the moment. We are in the process of setting up a 4 node
cluster however.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/total-dataloss-due-to-disk-space-issues-tp3198951p3199374.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
Erik Seilnacht
Software Architect
MuleSoft Inc.
30 Maiden Lane Suite #500
San Francisco, CA 94108
Phone: 415.846.0203


(system) #9