Logstash keep throwing exceptions every few seconds

Backgroud:

we have almost 200 beats client and a ELK cluster are consists of 20 ES data,3 master and 2 client nodes.
logstash is a cluster with 4 nodes behind the haproxy.

The archtechture like this:

metricbeat/filebeat/rsyslog/nxlog/->haproxy->logstash cluster(9 pipeline with PQ)->ES cluster

metricbeat env

version: v5.5.1

ES env

version: v5.5.1

logstash env

java version

180913231203 root@robintest01 logstash # java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

logstash version

180913231304 root@robintest01 bin # ./logstash -V
logstash 6.2.3

memory

180913231402 root@robintest01 bin # free -m
              total        used        free      shared  buff/cache   available
Mem:          15877        1603        9289         827        4984       13062
Swap:             0           0           0

CPU

180913231515 root@robintest01 bin # cat /proc/cpuinfo |grep -i 'processor'|wc -l
4

Logstash conf

180913231738 root@robintest01 bin # cat /etc/logstash/logstash.yml|grep -v '#'
path.data: /data/logstash
config.reload.automatic: true
config.reload.interval: 60s
log.level: info
path.logs: /var/log/logstash

logstash pipeline

180913231320 root@robintest01 bin # cat /etc/logstash/pipelines.yml 
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html



#- pipeline.id: main
#  path.config: "/etc/logstash/conf.d/*.conf"

- pipeline.id: filebeat
  path.config: /etc/logstash/conf.d/filebeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb

- pipeline.id: metricbeat
  path.config: /etc/logstash/conf.d/metricbeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb

- pipeline.id: tlsbeat
  path.config: /etc/logstash/conf.d/tlsbeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  

- pipeline.id: rsyslog-linux
  path.config: /etc/logstash/conf.d/rsyslog-linux.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: nxlog-windows
  path.config: /etc/logstash/conf.d/nxlog-windows.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: rsyslog-cisco-asa
  path.config: /etc/logstash/conf.d/rsyslog-cisco-asa.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: rsyslog-cisco-isr
  path.config: /etc/logstash/conf.d/rsyslog-cisco-isr.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  


- pipeline.id: rsyslog-cisco-switch
  path.config: /etc/logstash/conf.d/rsyslog-cisco-switch.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  


- pipeline.id: rsyslog-cisco-meraki
  path.config: /etc/logstash/conf.d/rsyslog-cisco-meraki.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb

pipeline for metricbeat

180913231814 root@robintest01 conf.d # cat metricbeat.conf
#beats logstash for pipeline metricbeat

input {
  beats {
    port => 5044
    client_inactivity_timeout => 0
  }
}


filter {
  
  if "_jsonparsefailure" in [tags] {
        drop { }
  }
  
  
  
}
 


output {
	file => "/tmp/metricbet.txt"
}

logstash process

180913231327 root@robintest01 bin # ps -ef|grep logstash
root      9015  3755  0 23:14 pts/0    00:00:00 grep --color logstash
logstash 39019     1 11 21:34 ?        00:11:58 /bin/java -Xms8g -Xmx8g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true -Djava.io.tmpdir=/tmp -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintClassHistogram -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/logstash/gclog.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2000k -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.5.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/javac-shaded-9-dev-r4023-3.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash

Logstash log exception:

8.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 8 more
[2018-09-13T23:05:55,286][INFO ][org.logstash.beats.BeatsHandler] [local: 10.91.86.150:5044, remote: 10.91.86.147:46674] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
[2018-09-13T23:05:55,286][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$300(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$4.run(AbstractChannelHandlerContext.java:236) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 10 more
[2018-09-13T23:05:57,543][INFO ][org.logstash.beats.BeatsHandler] [local: 10.91.86.150:5044, remote: 10.91.86.147:46772] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
[2018-09-13T23:05:57,543][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 8 more

Could someone help to have a look at this?

Best Regardfs

Is there something that does not speak the Beats/Lumberjack protocol sending messages to the port the Beats input is listening on?

Hi @yaauie,
we used metricbeat to collect metrics on linux servers and ship the metrics to HAproxy, and sent to logstash with listening port 5044 and protocol beats.

metricbeat->HAproxy(tcp:5044)->logstash(input(beats:5044))

There is no issue with listening protocol TCP or UDP on logstash.

Thanks in advance

Is your HAProxy performing health checks? If so, how is this configured? If they're doing anything more than a TCP health check (which looks for a SYN/ACK, and doesn't complete opening a TCP connection), they may be sending bytes that are not valid Beats protocol.

Hi @yaauie

Our HAproxy settings like this as following:

@/etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     50000
    user        haproxy
    group       haproxy
    daemon
    stats socket 127.0.0.1:14567



defaults
    log                     global
    mode                    tcp
    option                  dontlognull
    option                  redispatch
    retries                 3
    #timeout http-request    60s
    timeout queue           300s
    timeout connect         300s
    timeout client          300s
    timeout server          300s
    #timeout http-keep-alive 60s
    timeout check           60s
    maxconn                 50000



#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  metricbeat
     bind *:5044
     mode tcp
     timeout client 300s
     default_backend     metricbeat2ES
     maxconn 2000



#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend metricbeat2ES
    mode     tcp
    timeout server 300s
    balance  roundrobin
    server  robineslogstash01 10.0.0.1:5044    check  maxconn 2000
    server  robineslogstash02 10.0.0.2:5044    check  maxconn 2000
    server  robineslogstash03 10.0.0.3:5044    check  maxconn 2000
    server  robineslogstash04 10.0.0.4:5044    check  maxconn 2000

It looks like you need to add option tcp-check to your backend section to tell it to do a simple TCP check.

https://www.haproxy.com/documentation/aloha/10-0/traffic-management/lb-layer7/health-checks/#check-a-tcp-port

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.