How can I resolve a problem of reallocation memory?

(Juan Díaz González) #1

I have this problem with the elasticsearch health, I do the next query:

GET _cluster/health?level=indices&pretty

And the result is the next:

"new_gompute_history_2015-10-23_10:10:26": {
         "status": "red",
         "number_of_shards": 5,
         "number_of_replicas": 1,
         "active_primary_shards": 4,
         "active_shards": 8,
         "relocating_shards": 0,
         "initializing_shards": 1,
         "unassigned_shards": 1

I have enough space in the disk of my nodes more than 85%

Could somebody tell me something about this?

Thanks in advance

(Samir Bennacer) #2

Do you have any exception in the logs about shard allocation or shard initialization ?

If you check the health again after few minute does it still show red state ?

what happened before the cluster become red ,did you have any Outofmemeory ? did you restarted any nodes or restarted the cluster ?

(Juan Díaz González) #3

Before the cluster become red, I was only doing injection over elasticsearch and in the logs apears something like this:

[2015-11-11 00:00:00,279][WARN ][indices.recovery         ] [Golem] [new_gompute_history_2015-10-23_10:10:26][1] recovery from [[Smasher][RPYw6RGeTDGxg1g9us422Q][bc10-05][inet[/]]] failed
org.elasticsearch.transport.RemoteTransportException: [Smasher][inet[/]][internal:index/shard/recovery/start_recovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [new_gompute_history_2015-10-23_10:10:26][1] Phase[1] Execution failed
        at org.elasticsearch.index.engine.internal.InternalEngine.recover(
        at org.elasticsearch.index.shard.service.InternalIndexShard.recover(
        at org.elasticsearch.indices.recovery.RecoverySource.recover(
        at org.elasticsearch.indices.recovery.RecoverySource.access$2600(
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
        at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(
        at org.elasticsearch.transport.netty.MessageChannelHandler$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [new_gompute_history_2015-10-23_10:10:26][1] Failed to transfer [0] files with total size of [0b]
        at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(
        at org.elasticsearch.index.engine.internal.InternalEngine.recover(
        ... 9 more
Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/juan/nodes/1/indices/new_gompute_history_2015-10-23_10:10:26/1/index/_a_es090_0.pos
        at sun.nio.fs.UnixException.translateToIOException(
        at sun.nio.fs.UnixException.rethrowAsIOException(
        at sun.nio.fs.UnixException.rethrowAsIOException(
        at sun.nio.fs.UnixFileSystemProvider.newFileChannel(
        at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(
        ... 10 more

I didn´t restart the cluster and the shards that give problems appear in this way:

# figure out what shard is the problem
curl localhost:9200/_cat/shards

index                                       shard prirep state          docs   store ip        node  
new_gompute_history_2015-10-23_10:10:26     2     p      INITIALIZING       Alyosha Kravinoff 
new_gompute_history_2015-10-23_10:10:26     2     r      UNASSIGNED

(Juan Díaz González) #4

I tried to recollocate the shards with the command:

 curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
         "commands": [
                "allocate": {
                    "index": "'$INDEX'",
                    "shard": '$SHARD',
                    "node": "'$NODE'",
                    "allow_primary": true

But this give us this error:

{"error":"RemoteTransportException[[Smasher][inet[/]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [new_gompute_history_2015-10-23_10:10:26][2] on node [Golem][H2dlUy_VQJmcDVb-tWC0YQ][bc10-03][inet[/]] is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][NO(primary shard is not yet active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] <= 0)][YES(no active primary shard yet)][YES(enough disk for shard on node, free: [71.9gb])][YES(shard not primary or relocation disabled)]]; ","status":400}

And we cannot solve and continue doing the injection over the same index.

(system) #5