Slowness in logstash throughput while reading from topbeat : how to debug

biswajit86 · January 13, 2016, 11:20pm

Hi,

I am trying to use logstash to consume messages from topbeat. I setup a topbeat->logstash->elasticsearch flow and I can see the data in kibana. However, I keep on seeing these messages in logstash logs

Beats input: the pipeline is blocked, temporary refusing new connection. {:level=>:warn} CircuitBreaker::Close {:name=>"Beats input", :level=>:warn}
2.CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

I have a few questions regarding these

How can I figure out where there is a delay in throughput , is it
- the logstash input worker ,
- the logstash output worker ,
- elastic indexing throughput
Is there a way to monitor the indexing performance of elasticsearch cluster /node ?

warkolm · January 14, 2016, 6:21am

You can use Marvel for monitoring your cluster.

For Logstash though you'd want to look at the metrics filter. We're working on exposing more monitoring functionality with upcoming versions of LS.

biswajit86 · January 14, 2016, 5:08pm

Could you please tell me what metric points can i see in marvel to figure out any issues.

warkolm · January 14, 2016, 9:48pm

You can look at things like indexing throughput to see if it drops when you get this log entry.

biswajit86 · January 15, 2016, 9:08pm

Here is where I am stuck.

I am unable to find the point where there is a congestion in the stack when i see the below message

CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

I am unable to diagnose whether the elastic instance is slow or if the logstash instance is slow. I do not see any peritnent information in either elastic/logstash ) logs which explains when and how the pipeline is determined to be in a blocked state. Unless i can determine that, i am not sure if I need more logstash instances, more elastic instances, dedicated master node for elastic or a queueing solution.

How do I find out/deduce this piece of information

Omar_Al_Zabir · February 22, 2016, 2:44pm

Same problem here. I keep getting this problem throughout the day. Can't figure out why it is causing the problem every now and then. We definitely need better logging and monitoring capability in LS.

Omar_Al_Zabir · February 23, 2016, 11:18am

I have a pretty modest filter and < 100 TPS load on a 4 CPU server, still I get circuitbreaker tripped.
I can see CPU usage is 200%+ by the logstash java process.

input {
  beats {
    port => 5045
    type => 'iis'
  }
}


# First filter
filter {
  #ignore log comments
  if [message] =~ "^#" {
    drop {}
  }

  grok {
    patterns_dir => "./patterns"
    match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{IPORHOST:serverip} %{WORD:verb} %{PATH:request} %{NOTSPACE:querystring} %{NUMBER:port} %{NOTSPACE:auth} %{IPORHOST:clientip} %{NOTSPACE:agent} %{NUMBER:response} %{NUMBER:sub_response} %{NUMBER:sc_status} %{NUMBER:responsetime}" ]
  }
  date {
    match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    locale => "en"
  }  
}


# Second filter
filter {  
  if "_grokparsefailure" in [tags] {

    } else {
    # on success remove the message field to save space
    mutate {
      remove_field => ["message", "timestamp"]
    }
  } 
}

output {  
  if [system] {
    elasticsearch {
      hosts =>  ["10.35.132.143:9200","10.35.132.142:9200","10.35.76.37:9200"]
      index => "logstash-%{system}-%{group}-%{+YYYY.MM.dd}"
      template => "./conf/apache-mapping.json"
      template_name => "logstash"
      document_type => "%{type}"
      template_overwrite => true
      manage_template => true
    }
  } else  {
    elasticsearch {
        hosts =>  ["10.35.132.143:9200","10.35.132.142:9200","10.35.76.37:9200"]
        index => "junk"      
        document_type => "%{type}"
      }
  }

  #stdout { codec => rubydebug } 

}

remco · April 6, 2016, 5:53pm

Hi,

I'm having simular problems (error messages).
Just curious if you have already solved the issue?
For me it unsure if it a setting problem or grok being busy (multiple core 100% cpu) which might cause connection/pipeline problems.

Omar_Al_Zabir · April 14, 2016, 10:23pm

In my case it was the grok pattern. I had to make the pattern simpler to make it work.

ally · April 25, 2016, 3:46pm

Hello,

I have the same problem with similar configuration of logstash (just a grok and date in the filter). How did you split it? I have also tried but with not success.

Thanks!

remco · April 25, 2016, 7:26pm

For me it is not solved (i changed back to logstash instances on ever machine).
I changed the grok pattern which made logstash processes the message somewhat faster.

ally · April 25, 2016, 8:50pm

What did you change in the grok? @Omar_Al_Zabir said that he has splitted into several. I have a similar match but I was not able to make it work.

navox19 · April 28, 2016, 9:52pm

any help ?

ally · July 13, 2016, 8:56am

Not, I still haven't found the solution. I have split each service into different VM and slow down the amount of messages per second, but this does not really solve the problem.

Omar_Al_Zabir · November 18, 2016, 11:08am

You can test the reg ex you are using on https://www.regex101.com/ and see the time and cycle it takes. If it is over 100 then your regex is too expensive and thus logstash will jam.

seanziee · February 27, 2017, 7:25pm

Still nothing?

Topic		Replies	Views
Logstash - Beats input: The circuit breaker has detected a slowdown Logstash	1	1978	July 6, 2017
The circuit breaker has detected a slowdown or stall in the pipeline Logstash	2	1810	July 6, 2017
Pipeline stalls and errors in logstash Logstash	4	4464	July 6, 2017
Lots of Beats input: The circuit breaker has detected a slowdown or stall in the pipeline Beats	6	7113	June 30, 2016
Beats input blocked Logstash	8	1196	July 6, 2017

Slowness in logstash throughput while reading from topbeat : how to debug

Related topics