Optimize logstash tcp input plugin

true64gurus · July 6, 2023, 10:28pm

Hello,

I have 10 Kubernetes clusters forward their logs to logstash VM (k8s fluentd ---> logstash port 7000) .

Logstash gets to a point where logs are being missed and source pods doing retries to get logs through . ( errors I see on this case are listed below).

Looking for recommendation to optimize logstash tcp input .

Errors on logstash

[ERROR][logstash.inputs.tcp      ] xxxxxxxxxxxxxxx/x.x.x.x:16591: closing due:
java.net.SocketException: Connection reset
        at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426) ~[?:?]
        at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) ~[netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132) ~[netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) ~[netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.65.Final.jar:4.1.65.Final]
        at java.lang.Thread.run(Thread.java:833) [?:?]

Current Input snippet

input {
      tcp {
        codec => fluent
        port => 7000
        tcp_keep_alive => true
      }
}

Changes I tried without success

Optimize sysctl on logstash VM

 vm.max_map_count=262144
fs.file-max=65535
net.core.netdev_max_backlog=250000
net.core.netdev_budget=600
net.ipv4.tcp_mem=16777216 16777216 16777216

Increases ring buffers

ethtool -g ens192
Ring parameters for ens192:
Pre-set maximums:
RX:             4096
RX Mini:        2048
RX Jumbo:       4096
TX:             4096
Current hardware settings:
RX:             4096
RX Mini:        2048
RX Jumbo:       4096
TX:             4096

leandrojmp · July 6, 2023, 10:50pm

What is your output?

Sometimes if your output can't keep it up with the rate of events it will tell logstash to backoff a little and this can cause some small issues, like the logstash input queue filling up etc.

Also, what are the pipeline configurations like workers, batch size etc? Are you using the default memory queues or persistent queues?

true64gurus · July 6, 2023, 11:18pm

Output is listed below. for pipeline , I didn't change anything from defaults.

output 

if ([cluster]) {
      elasticsearch {
        hosts => ["https://xxxxxx1:9200","https://xxxxxx2:9200","https://xxxxxxx3:9200"]
        user => "admin"
        password => "xxxxxxxxxx"
        index => "%{[cluster]}-%{+YYYY-MM-dd}"
        ssl_certificate_verification => false
        ecs_compatibility => disabled
      }
  }
}

pipeline file 

- pipeline.id: k8sclusters
  path.config: "/usr/share/logstash/files/k8sclusters.conf"
  pipeline.ecs_compatibility: disabled

true64gurus · July 7, 2023, 1:56pm

@leandrojmp I am using pipeline defaults , using memory queues . The VM have 10cpus/64GM memory. Not sure how to improve performance ?

leandrojmp · July 7, 2023, 2:17pm

The default batch size for the pipelines is pretty low, it is 125 events per batch.

You may try to increase it and see if the things improve.

adding the line pipeline.batch.size: 250 in your pipeline config would double it.

Not sure what kind of logs you are collecting, but I have some firewalls with high event rate that I needed to use pipeline.batch.size: 1000 to solve some performance issues.

true64gurus · July 7, 2023, 2:28pm

Do I change to persistent queues?

leandrojmp · July 7, 2023, 2:32pm

I don't think it will help much, I would try first changing the pipeline batch size to make logstash send more events to Elasticsearch on each request.

true64gurus · July 18, 2023, 1:33pm

I increased pipeline.batch.size from 125 --> 250-->500--> 1000 and increased pipeline.workers to 12 . Still no improvement. It takes 1.5-2 days after logstash restart to get to this state.

system · August 15, 2023, 1:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash tcp input plugin connection reset error Logstash	1	248	February 2, 2024
Tcp input plugin errors Logstash	1	439	January 2, 2020
Logstash.inputs.tcp: Error in Netty pipeline: java.io.IOException: Connection reset by peer Logstash	1	1127	July 17, 2020
java.io.IOException: Connection reset by peer Logstash	1	1257	May 18, 2020
Logstash tcp input error : Error in Netty pipeline: java.io.IOException: Connection reset by peer Elasticsearch	1	821	April 1, 2021

Optimize logstash tcp input plugin

Related topics