Shards UNASSIGNED and OOM in logs

Hi,

We are using Elasticsearch 1.3.2 on a 2 nodes cluster in a production
environment.
Currently only a few indexes contain about 1000 documents.
Mapper-attachments plugin is used in model.
Some shards in one cluster node (SERVER 1) are in the following states:

{
state: INITIALIZING
primary: false
node: KkuMLz0_TKONN77uOoWE7A
relocating_node: null
shard: 3
index: programma_mare-trashcan

}

{
state: UNASSIGNED
primary: false
node: null
relocating_node: null
shard: 0
index: programma_mare_index_initial

}

There are a lot of warning messages like the following:

SERVER 1:

org.elasticsearch.transport.RemoteTransportException:
[inl-cdcl-ind2][inet[/10.73.193.51:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[context][1] Phase[1] Execution failed
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1078)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:636)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:135)
at
org.elasticsearch.indices.recovery.RecoverySource.access$2500(RecoverySource.java:72)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:440)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:426)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException:
[context][1] Failed to transfer [0] files with total size of [0b]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:280)
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1074)
... 9 more
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:700)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:685)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:176)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at
org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:96)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:346)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
at
org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98)
at
org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:124)
at org.elasticsearch.index.store.Store.access$300(Store.java:74)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:442)
at
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:433)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:144)
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:145)
... 10 more

SERVER 2:

[2014-11-02 00:02:12,250][WARN ][cluster.action.shard ] [inl-cdcl-ind2]
[context][1]
received shard failed for [context][1], node[KkuMLz0_TKONN77uOoWE7A], [R],
s[INITIALIZING], indexUUID [4GkRH6dNR--kmx0FtXwcRA], reason
[Failed to start shard, message [RecoveryFailedException[[context][1]:
Recovery failed from
[inl-cdcl-ind2][2eBj7ijRS82V8md2a78U-A][inl-cdcl-ind2][inet[/10.73.193.51:9300]]

into
[inl-cdcl-ind1][KkuMLz0_TKONN77uOoWE7A][inl-cdcl-ind1][inet[/10.73.193.50:9300]]];
nested:
RemoteTransportException[[inl-cdcl-ind2][inet[/10.73.193.51:9300]][index/shard/recovery/startRecovery]];

nested: RecoveryEngineException[[context][1] Phase[1] Execution failed];
nested: RecoverFilesRecoveryException[[context][1]
Failed to transfer [0] files with total size of [0b]]; nested:
OutOfMemoryError[Direct buffer memory]; ]]

...

[2014-11-01 00:02:28,251][WARN ][cluster.action.shard ]
[inl-cdcl-ind2] [my-index][3] received shard failed for [my-index][3],
node[KkuMLz0_TKONN77uOoWE7A], [R], s[INITIALIZING], indexUUID
[Y9nKzEPuRSGPsYhjisxYPQ],
reason [master
[inl-cdcl-ind2][2eBj7ijRS82V8md2a78U-A][inl-cdcl-ind2][inet[/10.73.193.51:9300]]

marked shard as initializing, but shard is marked as failed, resend shard
failure]

How can we identify the problem?

Thanks,
Regards

Matteo Cremolini

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6f7a5fa-7117-4e82-bf86-1b309d73b1e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.