Logstash keep throwing exceptions every few seconds


(Robin Guo) #1

Backgroud:

we have almost 200 beats client and a ELK cluster are consists of 20 ES data,3 master and 2 client nodes.
logstash is a cluster with 4 nodes behind the haproxy.

The archtechture like this:

metricbeat/filebeat/rsyslog/nxlog/->haproxy->logstash cluster(9 pipeline with PQ)->ES cluster

metricbeat env

version: v5.5.1

ES env

version: v5.5.1

logstash env

java version

180913231203 root@robintest01 logstash # java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

logstash version

180913231304 root@robintest01 bin # ./logstash -V
logstash 6.2.3

memory

180913231402 root@robintest01 bin # free -m
              total        used        free      shared  buff/cache   available
Mem:          15877        1603        9289         827        4984       13062
Swap:             0           0           0

CPU

180913231515 root@robintest01 bin # cat /proc/cpuinfo |grep -i 'processor'|wc -l
4

Logstash conf

180913231738 root@robintest01 bin # cat /etc/logstash/logstash.yml|grep -v '#'
path.data: /data/logstash
config.reload.automatic: true
config.reload.interval: 60s
log.level: info
path.logs: /var/log/logstash

logstash pipeline

180913231320 root@robintest01 bin # cat /etc/logstash/pipelines.yml 
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html



#- pipeline.id: main
#  path.config: "/etc/logstash/conf.d/*.conf"

- pipeline.id: filebeat
  path.config: /etc/logstash/conf.d/filebeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb

- pipeline.id: metricbeat
  path.config: /etc/logstash/conf.d/metricbeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb

- pipeline.id: tlsbeat
  path.config: /etc/logstash/conf.d/tlsbeat.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  

- pipeline.id: rsyslog-linux
  path.config: /etc/logstash/conf.d/rsyslog-linux.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: nxlog-windows
  path.config: /etc/logstash/conf.d/nxlog-windows.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: rsyslog-cisco-asa
  path.config: /etc/logstash/conf.d/rsyslog-cisco-asa.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 10240mb  

- pipeline.id: rsyslog-cisco-isr
  path.config: /etc/logstash/conf.d/rsyslog-cisco-isr.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  


- pipeline.id: rsyslog-cisco-switch
  path.config: /etc/logstash/conf.d/rsyslog-cisco-switch.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb  


- pipeline.id: rsyslog-cisco-meraki
  path.config: /etc/logstash/conf.d/rsyslog-cisco-meraki.conf
  pipeline.batch.size: 2000
  queue.type: persisted
  queue.max_bytes: 5120mb

pipeline for metricbeat

180913231814 root@robintest01 conf.d # cat metricbeat.conf
#beats logstash for pipeline metricbeat

input {
  beats {
    port => 5044
    client_inactivity_timeout => 0
  }
}


filter {
  
  if "_jsonparsefailure" in [tags] {
        drop { }
  }
  
  
  
}
 


output {
	file => "/tmp/metricbet.txt"
}

logstash process

180913231327 root@robintest01 bin # ps -ef|grep logstash
root      9015  3755  0 23:14 pts/0    00:00:00 grep --color logstash
logstash 39019     1 11 21:34 ?        00:11:58 /bin/java -Xms8g -Xmx8g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Djava.net.preferIPv4Stack=true -Djava.io.tmpdir=/tmp -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintClassHistogram -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/logstash/gclog.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2000k -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.5.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/javac-shaded-9-dev-r4023-3.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash

(Robin Guo) #2

Logstash log exception:

8.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 8 more
[2018-09-13T23:05:55,286][INFO ][org.logstash.beats.BeatsHandler] [local: 10.91.86.150:5044, remote: 10.91.86.147:46674] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
[2018-09-13T23:05:55,286][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$300(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$4.run(AbstractChannelHandlerContext.java:236) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 108
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 10 more
[2018-09-13T23:05:57,543][INFO ][org.logstash.beats.BeatsHandler] [local: 10.91.86.150:5044, remote: 10.91.86.147:46772] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
[2018-09-13T23:05:57,543][WARN ][io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.18.Final.jar:4.1.18.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 101
        at org.logstash.beats.BeatsParser.decode(BeatsParser.java:92) ~[logstash-input-beats-5.0.10.jar:?]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-all-4.1.18.Final.jar:4.1.18.Final]
        ... 8 more

(Robin Guo) #3

Could someone help to have a look at this?

Best Regardfs


(Ry Biesemeyer) #4

Is there something that does not speak the Beats/Lumberjack protocol sending messages to the port the Beats input is listening on?


(Robin Guo) #5

Hi @yaauie,
we used metricbeat to collect metrics on linux servers and ship the metrics to HAproxy, and sent to logstash with listening port 5044 and protocol beats.

metricbeat->HAproxy(tcp:5044)->logstash(input(beats:5044))

There is no issue with listening protocol TCP or UDP on logstash.

Thanks in advance


(Ry Biesemeyer) #6

Is your HAProxy performing health checks? If so, how is this configured? If they're doing anything more than a TCP health check (which looks for a SYN/ACK, and doesn't complete opening a TCP connection), they may be sending bytes that are not valid Beats protocol.


(Robin Guo) #7

Hi @yaauie

Our HAproxy settings like this as following:

@/etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     50000
    user        haproxy
    group       haproxy
    daemon
    stats socket 127.0.0.1:14567



defaults
    log                     global
    mode                    tcp
    option                  dontlognull
    option                  redispatch
    retries                 3
    #timeout http-request    60s
    timeout queue           300s
    timeout connect         300s
    timeout client          300s
    timeout server          300s
    #timeout http-keep-alive 60s
    timeout check           60s
    maxconn                 50000



#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  metricbeat
     bind *:5044
     mode tcp
     timeout client 300s
     default_backend     metricbeat2ES
     maxconn 2000



#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend metricbeat2ES
    mode     tcp
    timeout server 300s
    balance  roundrobin
    server  robineslogstash01 10.0.0.1:5044    check  maxconn 2000
    server  robineslogstash02 10.0.0.2:5044    check  maxconn 2000
    server  robineslogstash03 10.0.0.3:5044    check  maxconn 2000
    server  robineslogstash04 10.0.0.4:5044    check  maxconn 2000

(Ry Biesemeyer) #8

It looks like you need to add option tcp-check to your backend section to tell it to do a simple TCP check.

https://www.haproxy.com/documentation/aloha/10-0/traffic-management/lb-layer7/health-checks/#check-a-tcp-port


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.