Pipeline stalls and errors in logstash

We are currently testing out logstash, elastic search, kibana and filebeat to log some IIS log files, each thing can talk to each other correctly but when processing the logs in logstash to send to elastic search we get this error

"CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}"

"Beats input: the pipeline is blocked, temporary refusing new connection. {:reconnect_backoff_sleep=>0.5, :level=>:warn}"

Some logs are processed, but we keep getting this error.

We are using centos 6.7 on two hyper-v VM's, elastic search on one VM, with kibana and logstash on another. We have filebeat installed on the IIS 2012 r2 server.

The elastic search vm has 2 cores and 4096 RAM, logstash VM has 2048 RAM and 2 cores, the logs we are testing are only around 1MB in size with 6 log files in total.

Without a grok filter put in place there is no errors and logstash flies through the logs.

The grok filter is:

"filter{
if [message] =~ "^#" {
drop {}
}
if [type] == "filebeat iss log server name"{
grok {

match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp}%{SPACE}%{IPV4:host}%{SPACE}%{WORD:method}%{SPACE}%{PATH:apppath}%{SPACE}%{NOTSPACE:query}%{SPACE}%{NUMBER:port}%{SPACE}%{NOTSPACE}%{SPACE}%{IPV4:client}%{SPACE}%{NOTSPACE:useragent}%{SPACE}%{URI:referer}%{SPACE}%{NUMBER:status}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER:timetaken}"]
}
date{
match => ["log_timestamp", "YYYY-MM-dd HH:mm:ss"]
timezone => ["Europe/London"]}
}
}"

Any ideas on how to improve so all the logs are processed?

Can provide yml and conf files if needed

1 Like

hi, i am also facing the same issue. I am trying with filebeat 1.1.2 and logstash 2.2.2. Once the :message=>"Beats input: the pipeline is blocked, temporary refusing new connection.", :reconnect_backoff_sleep=>0.5, :level=>:warn} message is shown in logstash then it never recovers after that. If i restart logstash again everything seems to work fine for sometime. Not sure where the issue is with logstash or filebeat.I have separate vms for all.
After going through https://github.com/elastic/beats/issues/878
it seems they have fixed the issue in filebeat but still existing in logstash. For me filebeat restart does not work. Only when i restart logstash it works temprarily
Even i have tried with
congestion_threshold => 60

In log i can find below exception message

CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}

Something which instantly solved the problem for me was to disable the logstash STDOUT output filter (output { stdout { codec => rubydebug }}) which I forgot to remove after testing. Maybe you have the same issue.

I am able to somewhat fix the issue after some 2 days of ELK down in production.
After looking at the CPU usage of logstash found that during high volume transactions logstash CPU usage moved more than 400% (4 CPU) and it stayed there. Which should not have happened. There was no error in logs other than beatexception and socket closed which is clearly misleading.
Then i removed all my logs and added then 1 after another till i found the problem.
I had a bunch of high volume log files from which i used to check if certain key has some value then move to Eelasticsearch else remove them. I moved this logic to filebeat include parameter so less logs are coming to logstash now. And my CPU usage varies between 1 to 120% around now.
But i am keeping a watch on this as my batch runs at night with lots of information.