Elasticsearch being unstable

pranav_amin · September 25, 2013, 2:46pm

Hi,

Version used - 0.90.3

I was doing some performance benchmark, hence thought to load Elastic
search with 50 Million documents, each of 2k size. After some time, it
started giving error, the cluster went in RED COLOR.

Caused by: org.elasticsearch.transport.RemoteTransportException:
[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:37:36,716][WARN ][cluster.action.shard ] [52] sending
failed shard for [dw][2], node[tpzZTSz0R8yI_EU-faH1nA], [R],
s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[dw][2]: Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[52][tpzZTSz0R8yI_EU-faH1nA][inet[/10.3.176.22:9300]]{master=true}];
nested:
RemoteTransportException[[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[dw][2] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[dw][2] Failed to transfer [199] files with
total size of [844.3mb]]; nested:
RemoteTransportException[[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]];
nested:
FileNotFoundException[/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)]; ]]

This is caused by NO SPACE LEFT ON THE DISK. It is fine, this error
happened YESTERDAY somewhere NIGHT TIME. I STOPPED THE LOAD WHEN I SAW IT.
The problem is - Elastic search is CONTINUOUSLY BALANCING THE CLUSTER, I
COULD SEE IT IS CONSTANTLY moving the SHARDS here and there. IT LOOKS TO ME
IT HAS ENTERED IN TO SOME INFINITE LOOP.

IT CONSTANTLY SPITS THIS ON SOME NODES -

[2013-09-25 14:42:53,478][WARN ][index.engine.robin ] [51] [dw][3]
failed to read latest segment infos on flush
java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/3/index/_dzg.si
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
..at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,176][WARN ][indices.cluster ] [51] [dw][2]
failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: [dw][2]:
Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[51][hK3pwE8IQxu8RPlyBLtZ1Q][inet[/10.3.176.140:9300]]{master=true}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [dw][2]
Phase[1] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1125)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [dw][2]
Failed to transfer [199] files with total size of [844.3mb]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:226)
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1118)
... 9 more
Caused by: org.elasticsearch.transport.RemoteTransportException:
[51][inet[/10.3.176.140:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
.... at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,198][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][2],

Q1) NORMALLY, in other databases i have seen if there is A SPACE ISSUE, the
Database just ignores the transaction and if the load is stopped it is
still in STABLE CONDITION (like no SPITTING OF ERRORS). NOT SURE WHAT IS
ELASTIC SEARCH doing here, and WHY?

Q2) HOW DO I BRING THE CLUSTER IN STABLE STATE NOW?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · September 25, 2013, 6:02pm

Besides running out of disk space, you are also running into a "too many
open files" issue. Read the tutorial on the issue and see if it applies to
your case: Elasticsearch Platform — Find real-time answers at scale | Elastic

If your cluster is unstable because it is trying to re-alllocate shards to
nodes that have no space (I have seen this issue before myself), then
perhaps you can reduce the number of replicas until you solve the other
issues.

Cheers,

Ivan

On Wed, Sep 25, 2013 at 7:46 AM, pranav amin parulpatel25@gmail.com wrote:

Hi,

Version used - 0.90.3

I was doing some performance benchmark, hence thought to load Elastic
search with 50 Million documents, each of 2k size. After some time, it
started giving error, the cluster went in RED COLOR.

Caused by: org.elasticsearch.transport.RemoteTransportException:
[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:37:36,716][WARN ][cluster.action.shard ] [52] sending
failed shard for [dw][2], node[tpzZTSz0R8yI_EU-faH1nA], [R],
s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[dw][2]: Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[52][tpzZTSz0R8yI_EU-faH1nA][inet[/10.3.176.22:9300]]{master=true}];
nested: RemoteTransportException[[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[dw][2] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[dw][2] Failed to transfer [199] files with
total size of [844.3mb]]; nested:
RemoteTransportException[[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]];
nested:
FileNotFoundException[/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)]; ]]

This is caused by NO SPACE LEFT ON THE DISK. It is fine, this error
happened YESTERDAY somewhere NIGHT TIME. I STOPPED THE LOAD WHEN I SAW IT.
The problem is - Elastic search is CONTINUOUSLY BALANCING THE CLUSTER, I
COULD SEE IT IS CONSTANTLY moving the SHARDS here and there. IT LOOKS TO ME
IT HAS ENTERED IN TO SOME INFINITE LOOP.

IT CONSTANTLY SPITS THIS ON SOME NODES -

[2013-09-25 14:42:53,478][WARN ][index.engine.robin ] [51] [dw][3]
failed to read latest segment infos on flush
java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/3/index/_
dzg.si (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
..at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,176][WARN ][indices.cluster ] [51] [dw][2]
failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: [dw][2]:
Recovery failed from [53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true}
into [51][hK3pwE8IQxu8RPlyBLtZ1Q][inet[/10.3.176.140:9300]]{master=true}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [dw][2]
Phase[1] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1125)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [dw][2]
Failed to transfer [199] files with total size of [844.3mb]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:226)
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1118)
... 9 more
Caused by: org.elasticsearch.transport.RemoteTransportException:
[51][inet[/10.3.176.140:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
.... at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,198][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][2],

Q1) NORMALLY, in other databases i have seen if there is A SPACE ISSUE,
the Database just ignores the transaction and if the load is stopped it is
still in STABLE CONDITION (like no SPITTING OF ERRORS). NOT SURE WHAT IS
ELASTIC SEARCH doing here, and WHY?

Q2) HOW DO I BRING THE CLUSTER IN STABLE STATE NOW?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

pranav_amin · September 26, 2013, 7:30pm

Thanks for the feedback.

All the nodes have restarted fine, after i increased the space.

Can you tell what this error means (it is just on one node i see)? And if i
need to fix it?

[2013-09-26 19:28:10,111][INFO ][index.gateway.local ] [51] [dw][1]
ignoring recovery of a corrupt translog entry
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:554)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:451)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:329)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:618)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:223)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException:
Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using
backslash to be included in name
at [Source: [B@7c6cb947; line: 117, column: 5]
at
org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
at
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
at
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:560)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.parseEscapedFieldName(UTF8StreamJsonParser.java:1625)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.parseFieldName(UTF8StreamJsonParser.java:1589)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._parseFieldName(UTF8StreamJsonParser.java:1457)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:668)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:50)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:469)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:507)
... 8 more
[2013-09-26 19:28:10,869][WARN ][indices.cluster ] [51] [dw][1]
failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [dw][1]
failed to recover shard
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:237)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.IllegalArgumentException: No type mapped for [105]
at
org.elasticsearch.index.translog.Translog$Operation$Type.fromId(Translog.java:216)
at
org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:34)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:214)
... 4 more
[2013-09-26 19:28:10,869][WARN ][indices.memory ] [51] failed to
set shard [dw][1] index buffer to [19.8mb]
[2013-09-26 19:28:10,915][WARN ][index.translog ] [51] [dw][1]
failed to flush shard on translog threshold
java.lang.NullPointerException
at
org.elasticsearch.index.translog.fs.FsTranslog.revertTransient(FsTranslog.java:302)
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:902)
at
org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:502)
at
org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:186)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-26 19:28:10,947][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][1], node[Cj2s19AET0OUl_RZ3Cgq0g], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[dw][1] failed to recover shard];
nested: IllegalArgumentException[No type mapped for [105]]; ]]

Thanks

On Wednesday, September 25, 2013 2:02:23 PM UTC-4, Ivan Brusic wrote:

Besides running out of disk space, you are also running into a "too many
open files" issue. Read the tutorial on the issue and see if it applies to
your case: Elasticsearch Platform — Find real-time answers at scale | Elastic

If your cluster is unstable because it is trying to re-alllocate shards to
nodes that have no space (I have seen this issue before myself), then
perhaps you can reduce the number of replicas until you solve the other
issues.

Cheers,

Ivan

On Wed, Sep 25, 2013 at 7:46 AM, pranav amin <parulp...@gmail.com<javascript:>

wrote:

Hi,

Version used - 0.90.3

I was doing some performance benchmark, hence thought to load Elastic
search with 50 Million documents, each of 2k size. After some time, it
started giving error, the cluster went in RED COLOR.

Caused by: org.elasticsearch.transport.RemoteTransportException:
[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:37:36,716][WARN ][cluster.action.shard ] [52] sending
failed shard for [dw][2], node[tpzZTSz0R8yI_EU-faH1nA], [R],
s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[dw][2]: Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[52][tpzZTSz0R8yI_EU-faH1nA][inet[/10.3.176.22:9300]]{master=true}];
nested:
RemoteTransportException[[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[dw][2] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[dw][2] Failed to transfer [199] files with
total size of [844.3mb]]; nested:
RemoteTransportException[[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]];
nested:
FileNotFoundException[/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)]; ]]

This is caused by NO SPACE LEFT ON THE DISK. It is fine, this error
happened YESTERDAY somewhere NIGHT TIME. I STOPPED THE LOAD WHEN I SAW IT.
The problem is - Elastic search is CONTINUOUSLY BALANCING THE CLUSTER, I
COULD SEE IT IS CONSTANTLY moving the SHARDS here and there. IT LOOKS TO ME
IT HAS ENTERED IN TO SOME INFINITE LOOP.

IT CONSTANTLY SPITS THIS ON SOME NODES -

[2013-09-25 14:42:53,478][WARN ][index.engine.robin ] [51] [dw][3]
failed to read latest segment infos on flush
java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/3/index/_
dzg.si (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
..at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,176][WARN ][indices.cluster ] [51] [dw][2]
failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: [dw][2]:
Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[51][hK3pwE8IQxu8RPlyBLtZ1Q][inet[/10.3.176.140:9300]]{master=true}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[dw][2] Phase[1] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1125)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [dw][2]
Failed to transfer [199] files with total size of [844.3mb]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:226)
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1118)
... 9 more
Caused by: org.elasticsearch.transport.RemoteTransportException:
[51][inet[/10.3.176.140:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
.... at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,198][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][2],

Q1) NORMALLY, in other databases i have seen if there is A SPACE ISSUE,
the Database just ignores the transaction and if the load is stopped it is
still in STABLE CONDITION (like no SPITTING OF ERRORS). NOT SURE WHAT IS
ELASTIC SEARCH doing here, and WHY?

Q2) HOW DO I BRING THE CLUSTER IN STABLE STATE NOW?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

pranav_amin · September 27, 2013, 3:40pm

Any update on the error below?

On Thursday, September 26, 2013 3:30:19 PM UTC-4, pranav amin wrote:

Thanks for the feedback.

All the nodes have restarted fine, after i increased the space.

Can you tell what this error means (it is just on one node i see)? And if
i need to fix it?

[2013-09-26 19:28:10,111][INFO ][index.gateway.local ] [51] [dw][1]
ignoring recovery of a corrupt translog entry
org.elasticsearch.index.mapper.MapperParsingException: failed to parse
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:554)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:451)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:329)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:618)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:223)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException:
Illegal unquoted character ((CTRL-CHAR, code 0)): has to be escaped using
backslash to be included in name
at [Source: [B@7c6cb947; line: 117, column: 5]
at
org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1369)
at
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
at
org.elasticsearch.common.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:560)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.parseEscapedFieldName(UTF8StreamJsonParser.java:1625)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.parseFieldName(UTF8StreamJsonParser.java:1589)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._parseFieldName(UTF8StreamJsonParser.java:1457)
at
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:668)
at
org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:50)
at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:469)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:507)
... 8 more
[2013-09-26 19:28:10,869][WARN ][indices.cluster ] [51] [dw][1]
failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[dw][1] failed to recover shard
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:237)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.IllegalArgumentException: No type mapped for [105]
at
org.elasticsearch.index.translog.Translog$Operation$Type.fromId(Translog.java:216)
at
org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:34)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:214)
... 4 more
[2013-09-26 19:28:10,869][WARN ][indices.memory ] [51] failed to
set shard [dw][1] index buffer to [19.8mb]
[2013-09-26 19:28:10,915][WARN ][index.translog ] [51] [dw][1]
failed to flush shard on translog threshold
java.lang.NullPointerException
at
org.elasticsearch.index.translog.fs.FsTranslog.revertTransient(FsTranslog.java:302)
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:902)
at
org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:502)
at
org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:186)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-26 19:28:10,947][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][1], node[Cj2s19AET0OUl_RZ3Cgq0g], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[dw][1] failed to recover shard];
nested: IllegalArgumentException[No type mapped for [105]]; ]]

Thanks

On Wednesday, September 25, 2013 2:02:23 PM UTC-4, Ivan Brusic wrote:

Besides running out of disk space, you are also running into a "too many
open files" issue. Read the tutorial on the issue and see if it applies to
your case: Elasticsearch Platform — Find real-time answers at scale | Elastic

If your cluster is unstable because it is trying to re-alllocate shards
to nodes that have no space (I have seen this issue before myself), then
perhaps you can reduce the number of replicas until you solve the other
issues.

Cheers,

Ivan

On Wed, Sep 25, 2013 at 7:46 AM, pranav amin parulp...@gmail.com wrote:

Hi,

Version used - 0.90.3

I was doing some performance benchmark, hence thought to load Elastic
search with 50 Million documents, each of 2k size. After some time, it
started giving error, the cluster went in RED COLOR.

Caused by: org.elasticsearch.transport.RemoteTransportException:
[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:37:36,716][WARN ][cluster.action.shard ] [52] sending
failed shard for [dw][2], node[tpzZTSz0R8yI_EU-faH1nA], [R],
s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[dw][2]: Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[52][tpzZTSz0R8yI_EU-faH1nA][inet[/10.3.176.22:9300]]{master=true}];
nested:
RemoteTransportException[[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[dw][2] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[dw][2] Failed to transfer [199] files with
total size of [844.3mb]]; nested:
RemoteTransportException[[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]];
nested:
FileNotFoundException[/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)]; ]]

This is caused by NO SPACE LEFT ON THE DISK. It is fine, this error
happened YESTERDAY somewhere NIGHT TIME. I STOPPED THE LOAD WHEN I SAW IT.
The problem is - Elastic search is CONTINUOUSLY BALANCING THE CLUSTER, I
COULD SEE IT IS CONSTANTLY moving the SHARDS here and there. IT LOOKS TO ME
IT HAS ENTERED IN TO SOME INFINITE LOOP.

IT CONSTANTLY SPITS THIS ON SOME NODES -

[2013-09-25 14:42:53,478][WARN ][index.engine.robin ] [51] [dw][3]
failed to read latest segment infos on flush
java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/3/index/_
dzg.si (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
..at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,176][WARN ][indices.cluster ] [51] [dw][2]
failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: [dw][2]:
Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[51][hK3pwE8IQxu8RPlyBLtZ1Q][inet[/10.3.176.140:9300]]{master=true}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[dw][2] Phase[1] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1125)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [dw][2]
Failed to transfer [199] files with total size of [844.3mb]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:226)
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1118)
... 9 more
Caused by: org.elasticsearch.transport.RemoteTransportException:
[51][inet[/10.3.176.140:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
.... at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,198][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][2],

Q1) NORMALLY, in other databases i have seen if there is A SPACE ISSUE,
the Database just ignores the transaction and if the load is stopped it is
still in STABLE CONDITION (like no SPITTING OF ERRORS). NOT SURE WHAT IS
ELASTIC SEARCH doing here, and WHY?

Q2) HOW DO I BRING THE CLUSTER IN STABLE STATE NOW?

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Total dataloss due to disk space issues Elasticsearch	8	471	July 6, 2017
Weird Exception Elasticsearch	5	465	July 6, 2017
Elasticsearch on debian squeeze problem (too many open files) Elasticsearch	3	375	July 6, 2017
Corrupted all indices after a failure Elasticsearch	9	796	July 6, 2017
Cluster Health degraded overnight with no apparent reason Elasticsearch	5	1702	July 6, 2017

Elasticsearch being unstable

Related topics