Logstash io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 12766298444, max: 12771524608)

Hi All,

We are using Logstash 6.3.1 and recently we have been facing issues with getting the log flowing from Filebeat agents. Attaching the configuration and log samples.
We have increased the heap memory from 4GB to 6GB, 8GB and ultimately 12 GB and we are still facing this issue as logstash stops receiving the logs in about 15 minutes after every restart.
This issue is with a pipeline using input tcp. We also have a pipeline with input fro syslog, and it is parsed just fine.

Logstash log:
    [2021-01-04T07:33:35,865][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 12766298444, max: 12771524608)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:125) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [logstash-input-tcp-5.0.9.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [logstash-input-tcp-5.0.9.jar:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [logstash-input-tcp-5.0.9.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
[2021-01-04T07:33:35,865][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 12766298444, max: 12771524608)
        at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:125) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) ~[logstash-input-tcp-5.0.9.jar:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) ~[logstash-input-tcp-5.0.9.jar:?]

Logstash Pipeline configuration and statefulset
lle-uat.conf:
----
input {
beats {
port => 5045
ssl => true
ssl_certificate_authorities => ["/mnt/ca.cer"]
ssl_certificate => "/mnt/server.cer"
ssl_key => "/mnt/server.key"
ssl_verify_mode => "force_peer"
}
}

    filter  {
      if "majescopas" in [tags] {
      grok {
    match => {"message" => "%{TIMESTAMP_ISO8601:log_timestamp}%{WORD:ErrorCode}:\s%{GREEDYDATA}\s%{GREEDYDATA}%{NUMBER:Threshold}\s%{GREEDYDATA}"}
      }
    }
      if "IIBTransactions" in [tags] {
      json {
    source => "message"
      }
    }
      if "boundarytimepas" in [tags] {
      grok {
    match => {"message" => "%{TIMESTAMP_ISO8601:log_timestamp}\s%{LOGLEVEL}\s\[%{GREEDYDATA}]\s\(%{GREEDYDATA}\)\s%{GREEDYDATA}\s%{GREEDYDATA:Hostname}\s%{GREEDYDATA}\s%{GREEDYDATA}\s%{GREEDYDATA}\s%{GREEDYDATA}\s%{GREEDYDATA:RequestFormat}\s%{GREEDYDATA}=%{WORD:Component}\s%{GREEDYDATA}=%{GREEDYDATA:Process}\s%{GREEDYDATA}\s%{GREEDYDATA}=%{GREEDYDATA:Threshold}\s%{GREEDYDATA}=%{GREEDYDATA}"}
      }
    }
      }

    output {
    if "UAT" in [tags] {
    elasticsearch {
       user => "logstash"
       password => "logstash"
       hosts => "elasticsearch:9200"
       index => "logstash-lle-uat-%{+YYYY.MM.dd}"
         }
        }
       }


    lle-dpuat.conf:
    ----
    input {
       tcp {
    port => 5046
    type => syslog
        }
       udp {
    port => 5046
    type => syslog
        }
       }

    filter {
      grok {
    match => {"message" => "\<%{GREEDYDATA}\>%{TIMESTAMP_ISO8601}\s%{DATA:Tag}\s%{GREEDYDATA}"}
       }
      grok {
    match => {"message" => "\<%{NUMBER}\>%{TIMESTAMP_ISO8601}\s%{DATA:Tag}\s\[%{GREEDYDATA}\]\[%{GREEDYDATA}\]\[%{GREEDYDATA}\]\s%{DATA}\(%{DATA:mpgw}\):\s%{DATA}\(%{GREEDYDATA}\)\[%{IP:IncomingIP}\]\s%{DATA}\(%{DATA:gtid}\):\s%{GREEDYDATA},\s\[%{DATA:URL}\]"}
       }
     }

    output {
    if "UATDPLOGS" in [Tag] {
    elasticsearch {
       user => "logstash"
       password => "logstash"
       hosts => "elasticsearch:9200"
       index => "logstash-lle-uatdp-%{+YYYY.MM.dd}"
             }
       }
      }

Logstash Statefulset:
Name: logstash
Namespace: elk
CreationTimestamp: Fri, 08 Feb 2019 20:24:44 +0530
Selector: app=logstash
Labels: app=logstash
Annotations:
Replicas: 1 desired | 1 total
Update Strategy: OnDelete
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=logstash
Containers:
logstash:
Image: docker.elastic.co/logstash/logstash:6.3.1
Port: 5044/TCP
Host Port: 0/TCP
Limits:
cpu: 8
memory: 16Gi
Environment:
LS_JAVA_OPTS: -Xmx12g -Xms12g
Mounts:
/mnt from logstash-certs (rw)
/usr/share/logstash/config/logstash.yml from logstash-config (rw,path="logstash.yml")
/usr/share/logstash/config/pipelines.yml from pipeline-config (rw,path="pipelines.yml")
/usr/share/logstash/pipeline from pipelines (rw)
Volumes:
logstash-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: logstash-config
Optional: false
logstash-certs:
Type: Secret (a volume populated by a Secret)
SecretName: logstash-certs
Optional: false
pipelines:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pipelines
Optional: false
pipeline-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pipeline-config
Optional: false
Volume Claims:
Events:

Appreciate any help to help troubleshoot this issue. Please let me know if more input is required from me?

Regards,
Pavan

Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you :slight_smile:

What is in the Logstash log prior to that entry? Can you upgrade, 6.3 reached EOL in 2019 and the latest release is 7.10.

I think your tcp input is running out of memory to contain the data it is reading. That suggests it is never finding a line ending to allow it to flush an event into the pipeline.

What is writing to the tcp input in lle-dpuat.conf? If it is something that strips the newline off a line from the log and instead uses a framing protocol that never contains a newline then the input will buffer data forever.

Hi Badger,

The lle-dpuat.conf does not have the input type beats. I added it to show the difference in the working and non functional pipelines.
The lle-dpuat uses the syslog as the input type. Logs from our appliances are being written in that configuration.
Apart from that we have configured multiline patterns to most of our Filebeats and they were parsing the logs without any issues so far.

I don't understand what this means, is it what it means to have this

[quote="pavank, post:1, topic:260067"]
An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
[/quote] in the Logstash logs?

Regards,
Pavan

Apologies about the format, tried to edit the original post, will try to post the logs and config again in proper format.
Prior to the out of direct memory I see this entry:

 [2021-01-05T10:07:15,372][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 0
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 0
    at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.16.jar:?]
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
    ... 8 more

Regard,
Pavan

The exception is not coming from a beats input, it is coming from a tcp input.

Hey Badger,

Thank you for this, today I got the same out of direct memory with a different description.
My Logstash is running as a statefulset in a K8s cluster, and this is what I see in my logstash logs.

[2021-01-11T13:44:43,175][INFO ][org.logstash.beats.BeatsHandler] [local: 172.30.207.147:5045, remote: 10.184.250.221:45157] Handling exception: failed to allocate 41943040 byte(s) of direct memory (used: 8442574942, max: 8476557312)

In this the local IP corresponds to the node endpoint in my Logstash service file. The remote IP is the IP of one of my K8s Worker nodes. Also, the port 5045 is the port in the pipeline I have configured for the beats input.

I checked and can confirm that Logstash pod is not running on the mentioned Worker node.
Moreover, the memory bytes mentioned in the above log is the heap limit I have set under java_opts. Here's what my Logstash statefulset looks like:

  - env:
    - name: LS_JAVA_OPTS
      value: -Xmx8g -Xms8g
    image: docker.elastic.co/logstash/logstash:6.3.1
    imagePullPolicy: IfNotPresent
    name: logstash
    ports:
    - containerPort: 5044
      protocol: TCP
    resources:
      limits:
        cpu: "6"
        memory: 24Gi
      requests:
        cpu: "4"
        memory: 24Gi

Could you help me with why I see a worker IP under the remote IP and why it impacts Logstash?
The log flow is affected because of this issue. :expressionless: