Came into work today and noticed that our Logstash pipeline (on all two hosts) had halted with the following error in the logs.
Aug 11 00:22:40 its-elk-p03.uod.otago.ac.nz logstash[20536]: [2021-08-11T00:22:40,141][ERROR][logstash.javapipeline ][main] Pipeline worker error, the pipeline will be stopped {:pipeline_id=>"main", :error=>"(ArgumentError) invalid byte sequence in UTF-8", :exception=>Java::OrgJrubyExceptions::ArgumentError, :backtrace=>["org.jruby.RubyRegexp.match?(org/jruby/RubyRegexp.java:1180)", "usr.share.logstash.logstash_minus_core.lib.logstash.java_pipeline.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:295)"], :thread=>"#<Thread:0xf1261f4 sleep>"}
The other server suffered the same at 00:22:39
There don't appear to be any other useful log entries preceeding that.
The error occurred very soon after midnight, and we did just put in place the default logstash log4j2.properties file (it was missing from our deployment and previously was all going to stderr which is captured by journald).
We have a number of inputs that logstash reads from, but they are both Kafka topics of largely two groups (one is a plain JSON topic, the other is AVRO). We had restarted Logstash and the same error occurred.
We tried to isolate the error by using kafka-console-consumer etc. to see what messages might have caused it, but all messages appear to be valid JSON.
We changed the logging level of logstash to DEBUG
We further tried disabling the individual inputs so that logstash was running from just one at a time (still with DEBUG logging).... we have yet to see the error return.
I've run logstash normally (ie. both inputs) with DEBUG logging and default (INFO) logging, and it has been running fine, so I'm thinking its due to an issue with log rotation.
I'm gonna babysit it tonight and see if the same occurs.
Question: would configuring Logstash to use its Dead-Letter-Queue functionality help with this? (It's on the plan to implement, just wondering if I should bump it up).