Help - Erro ES


(Gustavo Maia) #1

Hi,

I got the following error.
My cluster has seven machines and indexes are organized into 10 shards and
a replica. I am using local gateay.
One host had problems no disk space. When did this problem I restart this
host. When making this (restart ES service) the ES shards copied to other
machines in the cluster, another two host in the cluster had the same
problem. With three hosts in the cluster of problems with no disk space, i
restarted all cluster. When restarted the cluster, ES deleted all indices
of all hosts, and i can't recover it.
Can someone explain to me what I did wrong and what would be the best
solution for the gateway.

Below is the log of the ES.

[20:53:04,677][WARN ][cluster.action.shard ] [Hippolyta] sending failed
shard for [diarioindex][9], node[2HYzKtiTS8-NPxQAhmwndg], [R], s[STARTED],
reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Dreamqueen][inet[/64.91.231.162:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][9] Index failed for
[diarioindexmap#24655240]]; nested: IOException[No space left on device]; ]]
[20:53:07,012][WARN ][action.index ] [Hippolyta] Failed to
perform index on replica Index Shard [diarioindex][6]
org.elasticsearch.transport.RemoteTransportException:
[Man-Spider][inet[/64.91.231.160:9300]][index/replica]
Caused by: org.elasticsearch.index.engine.IndexFailedEngineException:
[diarioindex][6] Index failed for [diarioindexmap#27403456]
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:484)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:260)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:251)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:237)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:448)
at
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:63)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flushBuffer(Store.java:580)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flushBuffer(OpenBufferedIndexOutput.java:101)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flush(OpenBufferedIndexOutput.java:88)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flush(Store.java:593)
at
org.apache.lucene.store.OpenBufferedIndexOutput.writeBytes(OpenBufferedIndexOutput.java:75)
at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43)
at
org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at
org.apache.lucene.index.FieldsWriter.flushDocument(FieldsWriter.java:116)
at
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:113)
at
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:152)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2066)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:567)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:479)
... 8 more
[20:53:07,013][WARN ][cluster.action.shard ] [Hippolyta] sending failed
shard for [diarioindex][6], node[i5vMP6aKTmyOmql477PtDQ], [R], s[STARTED],
reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Man-Spider][inet[/64.91.231.160:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][6] Index failed for
[diarioindexmap#27403456]]; nested: IOException[No space left on device]; ]]
[20:56:50,368][WARN ][indices.cluster ] [Hippolyta]
[diarioindex][7] failed to start shard
org.elasticsearch.indices.recovery.RecoveryF[20:58:25,227][INFO
][node ] [Hippolyta] {0.19.1}[18948]: stopped
[20:58:25,227][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closing ...
[20:58:25,241][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closed
[20:58:27,029][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initializing ...
[20:58:27,050][INFO ][plugins ] [Headknocker] loaded
[analysis-accent, cloud-aws], sites []
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initialized
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: starting ...
[20:58:28,120][INFO ][transport ] [Headknocker]
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/
64.91.231.164:9300]}
[20:58:37,205][INFO ][cluster.service ] [Headknocker]
detected_master [Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],
added {[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],},
reason: zen-disco-receive(from master
[[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]]])
[20:58:37,207][INFO ][indices.store ] [Headknocker]
[noticiaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[legislacaoindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[politicaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[diarioindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[jurisindexname] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]

--
Gustavo Maia


(Shay Banon) #2

It seems like you started a node in the cluster, that had no local storage
of the data. Its better to configure the gateway.recover_after setting (see
the configuration file) to make sure that it does not happen and enough
nodes are in the cluster before full recovery happens. Also, configure
discovery.minimum_master_nodes.

Based on the log, you see the message for dangling indices, this means that
when the node started, it joined the cluster, and the cluster did not have
in its metdata the relevant indices, but they existed on disk. Those
indices will only be deleted after 2 hours (sometimes, its valid that such
a scenario will happen).

On Tue, Jun 12, 2012 at 3:25 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Hi,

I got the following error.
My cluster has seven machines and indexes are organized into 10 shards
and a replica. I am using local gateay.
One host had problems no disk space. When did this problem I restart this
host. When making this (restart ES service) the ES shards copied to other
machines in the cluster, another two host in the cluster had the same
problem. With three hosts in the cluster of problems with no disk space, i
restarted all cluster. When restarted the cluster, ES deleted all indices
of all hosts, and i can't recover it.
Can someone explain to me what I did wrong and what would be the best
solution for the gateway.

Below is the log of the ES.

[20:53:04,677][WARN ][cluster.action.shard ] [Hippolyta] sending
failed shard for [diarioindex][9], node[2HYzKtiTS8-NPxQAhmwndg], [R],
s[STARTED], reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Dreamqueen][inet[/64.91.231.162:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][9] Index failed for
[diarioindexmap#24655240]]; nested: IOException[No space left on device]; ]]
[20:53:07,012][WARN ][action.index ] [Hippolyta] Failed to
perform index on replica Index Shard [diarioindex][6]
org.elasticsearch.transport.RemoteTransportException:
[Man-Spider][inet[/64.91.231.160:9300]][index/replica]
Caused by: org.elasticsearch.index.engine.IndexFailedEngineException:
[diarioindex][6] Index failed for [diarioindexmap#27403456]
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:484)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:260)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:251)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:237)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:448)
at
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:63)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flushBuffer(Store.java:580)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flushBuffer(OpenBufferedIndexOutput.java:101)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flush(OpenBufferedIndexOutput.java:88)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flush(Store.java:593)
at
org.apache.lucene.store.OpenBufferedIndexOutput.writeBytes(OpenBufferedIndexOutput.java:75)
at
org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43)
at
org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at
org.apache.lucene.index.FieldsWriter.flushDocument(FieldsWriter.java:116)
at
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:113)
at
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:152)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2066)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:567)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:479)
... 8 more
[20:53:07,013][WARN ][cluster.action.shard ] [Hippolyta] sending
failed shard for [diarioindex][6], node[i5vMP6aKTmyOmql477PtDQ], [R],
s[STARTED], reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Man-Spider][inet[/64.91.231.160:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][6] Index failed for
[diarioindexmap#27403456]]; nested: IOException[No space left on device]; ]]
[20:56:50,368][WARN ][indices.cluster ] [Hippolyta]
[diarioindex][7] failed to start shard
org.elasticsearch.indices.recovery.RecoveryF[20:58:25,227][INFO
][node ] [Hippolyta] {0.19.1}[18948]: stopped
[20:58:25,227][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closing ...
[20:58:25,241][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closed
[20:58:27,029][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initializing ...
[20:58:27,050][INFO ][plugins ] [Headknocker] loaded
[analysis-accent, cloud-aws], sites []
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initialized
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: starting ...
[20:58:28,120][INFO ][transport ] [Headknocker]
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/
64.91.231.164:9300]}
[20:58:37,205][INFO ][cluster.service ] [Headknocker]
detected_master [Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],
added {[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],},
reason: zen-disco-receive(from master
[[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]]])
[20:58:37,207][INFO ][indices.store ] [Headknocker]
[noticiaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[legislacaoindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[politicaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[diarioindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[jurisindexname] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]

--
Gustavo Maia


(Gustavo Maia) #3

Thanks, i will try this configure.

2012/6/13 Shay Banon kimchy@gmail.com

It seems like you started a node in the cluster, that had no local storage
of the data. Its better to configure the gateway.recover_after setting (see
the configuration file) to make sure that it does not happen and enough
nodes are in the cluster before full recovery happens. Also, configure
discovery.minimum_master_nodes.

Based on the log, you see the message for dangling indices, this means
that when the node started, it joined the cluster, and the cluster did not
have in its metdata the relevant indices, but they existed on disk. Those
indices will only be deleted after 2 hours (sometimes, its valid that such
a scenario will happen).

On Tue, Jun 12, 2012 at 3:25 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Hi,

I got the following error.
My cluster has seven machines and indexes are organized into 10 shards
and a replica. I am using local gateay.
One host had problems no disk space. When did this problem I restart this
host. When making this (restart ES service) the ES shards copied to other
machines in the cluster, another two host in the cluster had the same
problem. With three hosts in the cluster of problems with no disk space, i
restarted all cluster. When restarted the cluster, ES deleted all indices
of all hosts, and i can't recover it.
Can someone explain to me what I did wrong and what would be the best
solution for the gateway.

Below is the log of the ES.

[20:53:04,677][WARN ][cluster.action.shard ] [Hippolyta] sending
failed shard for [diarioindex][9], node[2HYzKtiTS8-NPxQAhmwndg], [R],
s[STARTED], reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Dreamqueen][inet[/64.91.231.162:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][9] Index failed for
[diarioindexmap#24655240]]; nested: IOException[No space left on device]; ]]
[20:53:07,012][WARN ][action.index ] [Hippolyta] Failed to
perform index on replica Index Shard [diarioindex][6]
org.elasticsearch.transport.RemoteTransportException:
[Man-Spider][inet[/64.91.231.160:9300]][index/replica]
Caused by: org.elasticsearch.index.engine.IndexFailedEngineException:
[diarioindex][6] Index failed for [diarioindexmap#27403456]
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:484)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:260)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:251)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:237)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:499)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:448)
at
org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:63)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flushBuffer(Store.java:580)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flushBuffer(OpenBufferedIndexOutput.java:101)
at
org.apache.lucene.store.OpenBufferedIndexOutput.flush(OpenBufferedIndexOutput.java:88)
at
org.elasticsearch.index.store.Store$StoreIndexOutput.flush(Store.java:593)
at
org.apache.lucene.store.OpenBufferedIndexOutput.writeBytes(OpenBufferedIndexOutput.java:75)
at
org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43)
at
org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at
org.apache.lucene.index.FieldsWriter.flushDocument(FieldsWriter.java:116)
at
org.apache.lucene.index.StoredFieldsWriter.finishDocument(StoredFieldsWriter.java:113)
at
org.apache.lucene.index.StoredFieldsWriter$PerDoc.finish(StoredFieldsWriter.java:152)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.writeDocument(DocumentsWriter.java:1404)
at
org.apache.lucene.index.DocumentsWriter$WaitQueue.add(DocumentsWriter.java:1424)
at
org.apache.lucene.index.DocumentsWriter.finishDocument(DocumentsWriter.java:1043)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2066)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:567)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:479)
... 8 more
[20:53:07,013][WARN ][cluster.action.shard ] [Hippolyta] sending
failed shard for [diarioindex][6], node[i5vMP6aKTmyOmql477PtDQ], [R],
s[STARTED], reason [Failed to perform [index] on replica, message
[RemoteTransportException[[Man-Spider][inet[/64.91.231.160:9300]][index/replica]];
nested: IndexFailedEngineException[[diarioindex][6] Index failed for
[diarioindexmap#27403456]]; nested: IOException[No space left on device]; ]]
[20:56:50,368][WARN ][indices.cluster ] [Hippolyta]
[diarioindex][7] failed to start shard
org.elasticsearch.indices.recovery.RecoveryF[20:58:25,227][INFO
][node ] [Hippolyta] {0.19.1}[18948]: stopped
[20:58:25,227][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closing ...
[20:58:25,241][INFO ][node ] [Hippolyta]
{0.19.1}[18948]: closed
[20:58:27,029][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initializing ...
[20:58:27,050][INFO ][plugins ] [Headknocker] loaded
[analysis-accent, cloud-aws], sites []
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: initialized
[20:58:28,068][INFO ][node ] [Headknocker]
{0.19.1}[29196]: starting ...
[20:58:28,120][INFO ][transport ] [Headknocker]
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/
64.91.231.164:9300]}
[20:58:37,205][INFO ][cluster.service ] [Headknocker]
detected_master [Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],
added {[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]],},
reason: zen-disco-receive(from master
[[Urthona][C4HUzWlET2u_351fKTR4FA][inet[/64.91.231.161:9300]]])
[20:58:37,207][INFO ][indices.store ] [Headknocker]
[noticiaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[legislacaoindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[politicaindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[diarioindex] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]
[20:58:37,208][INFO ][indices.store ] [Headknocker]
[jurisindexname] dangling index, exists on local file system, but not in
cluster metadata, scheduling to delete in [2h]

--
Gustavo Maia

--
Gustavo Maia


(system) #4