In mid-June or so we started having something that feels like a performance issue, but with really low data volumes.
We have gzipped logs being stored in an S3 bucket. From there, we use logstash with the S3 input plugin to process the logs. We use fingerprint configured as follows:
fingerprint {
method => "SHA256"
source => "message"
target => "fingerprint"
key => "%{hash_key}"
base64encode => false
remove_field => "hash_key"
}
to generate a hash which we use for duplicate detection and prevention. Due to some internal reasons we can have substantial duplicate reporting of logs, and having logstash sort it out was the easiest solution (until the recent issues). The hash_key is set in the config file further up based on how we are de-duping the data streams.
We are now seeing compressed log files under 1MB taking days to ingest, if they ever finish at all. In many cases logstash will appear to be stuck with the CPU fully consumed. Previously we were able to easily ingest hundreds of MB of logs a day without any performance issues. We haven't changed our configuration file at all.
When killing logstash (ctrl-c) we get the following stalled thread warning:
20:11:31.010 [Ruby-0-Thread-25: /usr/share/logstash/logstash-core/lib/logstash/shutdown_watcher.rb:31] WARN logstash.shutdownwatcher - {"inflight_count"=>76, "stalling_thread_info"=>{["LogStash::Filters::Fingerprint", {"method"=>"SHA256", "source"=>"message", "target"=>"fingerprint", "key"=>"%{hash_key}", "base64encode"=>"false", "remove_field"=>"hash_key", "id"=>"e99a1fa3f369eb2c222247f3b4378ada64ba62f8-12"}]=>[{"thread_id"=>26, "name"=>"[main]>worker0", "current_call"=>"(eval):8069:in
=~'"}]}}`
The stalled threads do not seem to ever finish. I've waited an hour or more before hitting ctrl-c again to just kill the whole thing.
Obviously this makes little sense since just taking a hash of the event object is quite straightforward. I've run the logstash config using log.level debug, and I can see it processing records ever more slowly, until it just stops. Right now the last log line was from 10 minutes ago and the log ingest is not yet complete. The last line is:
20:14:44.734 [[main]>worker1] DEBUG logstash.filters.fingerprint - filters/LogStash::Filters::Fingerprint: removing field {:field=>"hash_key"}
Which makes me think that fingerprint has successfully completed, however if I were to kill logstash, I'd see the stalled thread warning complaining about fingerprint.
Any assistance in fixing this would be appreciated.