Logstash at 100% CPU, slow to process Redis queue to Elasticsearch

Hello! I have a Logstash instance that is reading data from a Redis queue, processing it via some grok filters, and then outputting to Elasticsearch.

For a day or two, this was working properly - decent CPU but not overloaded and processing quickly. However, starting today, it is now using 100% CPU (on 24 cores!) and is processing records at a snail's pace. It seems to read in about 4K records, process them, and then sit there at high CPU until reading more records a couple of minutes later.

I've tried to modify the ES_HEAP_SIZE and pipeline threads with no effect shown. I'm not seeing anything crazy in the logs (or really, much at all) and the server seems happy otherwise.

input { redis { host => "127.0.0.1" data_type => "list" key => "filebeat" add_field => { "beattype" => "filebeat" } } } filter { mutate { rename => { "@metadata" => "metadata" } } } filter { if [type] == "referlog" { grok { match => { "message" => "%{REFERLOGENTRY}" } } date { locale => "en" timezone => "America/Los_Angeles" match => [ "timestamp", "YYYY-MM-dd HH:mm:ss" ] } mutate { gsub => [ "txid", "\"", "" ] gsub => [ "txid", " ", "" ] gsub => [ "email", "\"", "" ] gsub => [ "referrer", "\"", "" ] gsub => [ "useragent", "\"", "" ] } if [useragent] != "-" and [useragent] != "" { useragent { add_tag => [ "UA" ] source => "useragent" } } if "UA" in [tags] { if [device] == "Other" { mutate { remove_field => "device" } } if [name] == "Other" { mutate { remove_field => "name" } } if [os] == "Other" { mutate { remove_field => "os" } } } geoip { source => "clientip" target => "geoip" database => "/etc/logstash/GeoIP.dat" add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] } mutate { convert => [ "[geoip][coordinates]", "float"] } } } output { elasticsearch { hosts => ["192.168.0.2:9200"] sniffing => false manage_template => false index => "%{[beattype]}-%{[type]}-%{+YYYY.MM}" document_type => "%{[type]}" } }

Any idea where I can start looking? Again, this was working fine with no changes a day or two ago, and just started going silly today.

What version of things are you running?

Did you find a solution to this? I am seeing something very similar.

Thanks