Fingerprint plugin stalled

In mid-June or so we started having something that feels like a performance issue, but with really low data volumes.

We have gzipped logs being stored in an S3 bucket. From there, we use logstash with the S3 input plugin to process the logs. We use fingerprint configured as follows:

fingerprint {
method => "SHA256"
source => "message"
target => "fingerprint"
key => "%{hash_key}"
base64encode => false
remove_field => "hash_key"
}

to generate a hash which we use for duplicate detection and prevention. Due to some internal reasons we can have substantial duplicate reporting of logs, and having logstash sort it out was the easiest solution (until the recent issues). The hash_key is set in the config file further up based on how we are de-duping the data streams.

We are now seeing compressed log files under 1MB taking days to ingest, if they ever finish at all. In many cases logstash will appear to be stuck with the CPU fully consumed. Previously we were able to easily ingest hundreds of MB of logs a day without any performance issues. We haven't changed our configuration file at all.

When killing logstash (ctrl-c) we get the following stalled thread warning:

20:11:31.010 [Ruby-0-Thread-25: /usr/share/logstash/logstash-core/lib/logstash/shutdown_watcher.rb:31] WARN logstash.shutdownwatcher - {"inflight_count"=>76, "stalling_thread_info"=>{["LogStash::Filters::Fingerprint", {"method"=>"SHA256", "source"=>"message", "target"=>"fingerprint", "key"=>"%{hash_key}", "base64encode"=>"false", "remove_field"=>"hash_key", "id"=>"e99a1fa3f369eb2c222247f3b4378ada64ba62f8-12"}]=>[{"thread_id"=>26, "name"=>"[main]>worker0", "current_call"=>"(eval):8069:in=~'"}]}}`

The stalled threads do not seem to ever finish. I've waited an hour or more before hitting ctrl-c again to just kill the whole thing.

Obviously this makes little sense since just taking a hash of the event object is quite straightforward. I've run the logstash config using log.level debug, and I can see it processing records ever more slowly, until it just stops. Right now the last log line was from 10 minutes ago and the log ingest is not yet complete. The last line is:
20:14:44.734 [[main]>worker1] DEBUG logstash.filters.fingerprint - filters/LogStash::Filters::Fingerprint: removing field {:field=>"hash_key"}
Which makes me think that fingerprint has successfully completed, however if I were to kill logstash, I'd see the stalled thread warning complaining about fingerprint.

Any assistance in fixing this would be appreciated.

Have you tried using the hot threads API to check what Logstash is doing?

I was about to suggest that the system is out of entropy, but the doesn't sound likely if Logstash is using a lot of CPU.

@magnusbaeck thanks for the suggestion.

The line (eval):8069:in=~ in the stalled thread message led me down the right path. The line after the fingerprint was an
if =~ /nasty regex/
There were some changes in the input data that didn't play well with that regex, and it was taking arbitrarily long amounts of time to complete. I've since cleaned it up and we are back to our previous performance levels.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.