How can I resolve a problem of reallocation memory?


(Juan Díaz González) #1

I have this problem with the elasticsearch health, I do the next query:

GET _cluster/health?level=indices&pretty

And the result is the next:

"new_gompute_history_2015-10-23_10:10:26": {
         "status": "red",
         "number_of_shards": 5,
         "number_of_replicas": 1,
         "active_primary_shards": 4,
         "active_shards": 8,
         "relocating_shards": 0,
         "initializing_shards": 1,
         "unassigned_shards": 1
      },

I have enough space in the disk of my nodes more than 85%

Could somebody tell me something about this?

Thanks in advance


(Samir Bennacer) #2

Do you have any exception in the logs about shard allocation or shard initialization ?

If you check the health again after few minute does it still show red state ?

what happened before the cluster become red ,did you have any Outofmemeory ? did you restarted any nodes or restarted the cluster ?


(Juan Díaz González) #3

Before the cluster become red, I was only doing injection over elasticsearch and in the logs apears something like this:

[2015-11-11 00:00:00,279][WARN ][indices.recovery         ] [Golem] [new_gompute_history_2015-10-23_10:10:26][1] recovery from [[Smasher][RPYw6RGeTDGxg1g9us422Q][bc10-05][inet[/10.8.5.15:9301]]] failed
org.elasticsearch.transport.RemoteTransportException: [Smasher][inet[/10.8.5.15:9301]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [new_gompute_history_2015-10-23_10:10:26][1] Phase[1] Execution failed
        at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1151)
        at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:654)
        at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:137)
        at org.elasticsearch.indices.recovery.RecoverySource.access$2600(RecoverySource.java:74)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:440)
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:426)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [new_gompute_history_2015-10-23_10:10:26][1] Failed to transfer [0] files with total size of [0b]
        at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:276)
        at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1147)
        ... 9 more
Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/juan/nodes/1/indices/new_gompute_history_2015-10-23_10:10:26/1/index/_a_es090_0.pos
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
        at java.nio.channels.FileChannel.open(FileChannel.java:287)
        at java.nio.channels.FileChannel.open(FileChannel.java:334)
        at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
        at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
        at org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
        at org.elasticsearch.index.store.Store$MetadataSnapshot.checksumFromLuceneFile(Store.java:708)
        at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:613)
        at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:596)
        at org.elasticsearch.index.store.Store.getMetadata(Store.java:186)
        at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:146)
        ... 10 more

I didn´t restart the cluster and the shards that give problems appear in this way:

# figure out what shard is the problem
curl localhost:9200/_cat/shards

index                                       shard prirep state          docs   store ip        node  
new_gompute_history_2015-10-23_10:10:26     2     p      INITIALIZING                10.8.5.15 Alyosha Kravinoff 
new_gompute_history_2015-10-23_10:10:26     2     r      UNASSIGNED

(Juan Díaz González) #4

I tried to recollocate the shards with the command:

 curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
         "commands": [
            {
                "allocate": {
                    "index": "'$INDEX'",
                    "shard": '$SHARD',
                    "node": "'$NODE'",
                    "allow_primary": true
              }
            }
        ]
      }

But this give us this error:

{"error":"RemoteTransportException[[Smasher][inet[/10.8.5.15:9301]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [new_gompute_history_2015-10-23_10:10:26][2] on node [Golem][H2dlUy_VQJmcDVb-tWC0YQ][bc10-03][inet[/10.8.5.13:9301]] is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][NO(primary shard is not yet active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] <= 0)][YES(no active primary shard yet)][YES(enough disk for shard on node, free: [71.9gb])][YES(shard not primary or relocation disabled)]]; ","status":400}

And we cannot solve and continue doing the injection over the same index.


(system) #5