Before the cluster become red, I was only doing injection over elasticsearch and in the logs apears something like this:
[2015-11-11 00:00:00,279][WARN ][indices.recovery ] [Golem] [new_gompute_history_2015-10-23_10:10:26][1] recovery from [[Smasher][RPYw6RGeTDGxg1g9us422Q][bc10-05][inet[/10.8.5.15:9301]]] failed
org.elasticsearch.transport.RemoteTransportException: [Smasher][inet[/10.8.5.15:9301]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [new_gompute_history_2015-10-23_10:10:26][1] Phase[1] Execution failed
at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1151)
at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:654)
at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:137)
at org.elasticsearch.indices.recovery.RecoverySource.access$2600(RecoverySource.java:74)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:440)
at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:426)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [new_gompute_history_2015-10-23_10:10:26][1] Failed to transfer [0] files with total size of [0b]
at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:276)
at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1147)
... 9 more
Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/juan/nodes/1/indices/new_gompute_history_2015-10-23_10:10:26/1/index/_a_es090_0.pos
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
at org.elasticsearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:708)
at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:613)
at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:596)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:186)
at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:146)
... 10 more
I didn´t restart the cluster and the shards that give problems appear in this way:
# figure out what shard is the problem
curl localhost:9200/_cat/shards
index shard prirep state docs store ip node
new_gompute_history_2015-10-23_10:10:26 2 p INITIALIZING 10.8.5.15 Alyosha Kravinoff
new_gompute_history_2015-10-23_10:10:26 2 r UNASSIGNED
{"error":"RemoteTransportException[[Smasher][inet[/10.8.5.15:9301]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [new_gompute_history_2015-10-23_10:10:26][2] on node [Golem][H2dlUy_VQJmcDVb-tWC0YQ][bc10-03][inet[/10.8.5.13:9301]] is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][NO(primary shard is not yet active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] <= 0)][YES(no active primary shard yet)][YES(enough disk for shard on node, free: [71.9gb])][YES(shard not primary or relocation disabled)]]; ","status":400}
And we cannot solve and continue doing the injection over the same index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.