Logstash hangs with OutOfMemoryError

Today i noticed that 5/8 logstash indexers were "hung" and i see the below error in /var/log/logstash/logstash.err

  • Running logstash 1.5.3 with Transport Node for elasticsearch output

  • Running logstash with LS_HEAP_SIZE="2g" in /etc/init.d/logstash config

  • Running Elasticsearch 1.7.3

  • I checked all the elasticsearch data nodes and see no errors. And Marvel says there are no issues with Elasticsearch Heap. (no nodes is > 70% JVM heap)

Has anyone seen this issue before?

Oct 26, 2015 9:02:55 AM org.elasticsearch.transport.netty.NettyInternalESLogger warn
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at org.elasticsearch.common.netty.channel.socket.nio.SocketReceiveBufferAllocator.newBuffer(SocketReceiveBufferAllocator.java:64)
at org.elasticsearch.common.netty.channel.socket.nio.SocketReceiveBufferAllocator.get(SocketReceiveBufferAllocator.java:41)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:62)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is that an ES client node that is OOMing or a Logstash instance?
it looks like ES but your notes are a little unclear on that aspect.

that log came from logstash instance. "i see the below error in /var/log/logstash/logstash.err"

What does your config look like?

Pretty simple output

output {

send logstash metrics to marvel cluster instead of production

if "lumberjack_metric" in [tags] or "syslog_metric" in [tags] or "redis_metric" in [tags]
{
elasticsearch
{
host => ["ip1","ip2"]
protocol => "http"
workers => "2"
cluster => "es_mon"
}
}
else
{
elasticsearch {
host => ["ip4:9350","ip5:9350","ip6:9350"]
protocol => "transport"
cluster => "vcc_cluster"
workers => "10"
}
}
}

This quite probably explains why you're experiencing memory issues. You should check how many transport connections are open, but my guess is that it will be in the vicinity of 30. The reason is that with multiple hosts defined it will try to spin up "workers" multiplied by hosts. Since your example is using transport, this would be a java client for each worker spun up, which will consume a considerable amount of memory added up.

I recommend switching to using the http protocol for the second example as well. With recent releases of the plugin, it should do some client round-robining and get you the speed you are seeking, without the extreme overhead of 10 java clients per host.

Thats for the suggestion Aaron. I will move to HTTP protocol since 2.0 release will default to that anyways.