Slowness in logstash throughput while reading from topbeat : how to debug

Hi,

I am trying to use logstash to consume messages from topbeat. I setup a topbeat->logstash->elasticsearch flow and I can see the data in kibana. However, I keep on seeing these messages in logstash logs

  1. Beats input: the pipeline is blocked, temporary refusing new connection. {:level=>:warn} CircuitBreaker::Close {:name=>"Beats input", :level=>:warn}
    2.CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
  2. Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

I have a few questions regarding these

  1. How can I figure out where there is a delay in throughput , is it
    • the logstash input worker ,
    • the logstash output worker ,
    • elastic indexing throughput
  2. Is there a way to monitor the indexing performance of elasticsearch cluster /node ?

You can use Marvel for monitoring your cluster.

For Logstash though you'd want to look at the metrics filter. We're working on exposing more monitoring functionality with upcoming versions of LS.

Could you please tell me what metric points can i see in marvel to figure out any issues.

You can look at things like indexing throughput to see if it drops when you get this log entry.

Here is where I am stuck.

I am unable to find the point where there is a congestion in the stack when i see the below message

CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

I am unable to diagnose whether the elastic instance is slow or if the logstash instance is slow. I do not see any peritnent information in either elastic/logstash ) logs which explains when and how the pipeline is determined to be in a blocked state. Unless i can determine that, i am not sure if I need more logstash instances, more elastic instances, dedicated master node for elastic or a queueing solution.

How do I find out/deduce this piece of information

Same problem here. I keep getting this problem throughout the day. Can't figure out why it is causing the problem every now and then. We definitely need better logging and monitoring capability in LS.

I have a pretty modest filter and < 100 TPS load on a 4 CPU server, still I get circuitbreaker tripped.
I can see CPU usage is 200%+ by the logstash java process.

input {
  beats {
    port => 5045
    type => 'iis'
  }
}


# First filter
filter {
  #ignore log comments
  if [message] =~ "^#" {
    drop {}
  }

  grok {
    patterns_dir => "./patterns"
    match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{IPORHOST:serverip} %{WORD:verb} %{PATH:request} %{NOTSPACE:querystring} %{NUMBER:port} %{NOTSPACE:auth} %{IPORHOST:clientip} %{NOTSPACE:agent} %{NUMBER:response} %{NUMBER:sub_response} %{NUMBER:sc_status} %{NUMBER:responsetime}" ]
  }
  date {
    match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    locale => "en"
  }  
}


# Second filter
filter {  
  if "_grokparsefailure" in [tags] {

    } else {
    # on success remove the message field to save space
    mutate {
      remove_field => ["message", "timestamp"]
    }
  } 
}

output {  
  if [system] {
    elasticsearch {
      hosts =>  ["10.35.132.143:9200","10.35.132.142:9200","10.35.76.37:9200"]
      index => "logstash-%{system}-%{group}-%{+YYYY.MM.dd}"
      template => "./conf/apache-mapping.json"
      template_name => "logstash"
      document_type => "%{type}"
      template_overwrite => true
      manage_template => true
    }
  } else  {
    elasticsearch {
        hosts =>  ["10.35.132.143:9200","10.35.132.142:9200","10.35.76.37:9200"]
        index => "junk"      
        document_type => "%{type}"
      }
  }

  #stdout { codec => rubydebug } 

}

Hi,

I'm having simular problems (error messages).
Just curious if you have already solved the issue?
For me it unsure if it a setting problem or grok being busy (multiple core 100% cpu) which might cause connection/pipeline problems.

In my case it was the grok pattern. I had to make the pattern simpler to make it work.

Hello,

I have the same problem with similar configuration of logstash (just a grok and date in the filter). How did you split it? I have also tried but with not success.

Thanks!

For me it is not solved (i changed back to logstash instances on ever machine).
I changed the grok pattern which made logstash processes the message somewhat faster.

What did you change in the grok? @Omar_Al_Zabir said that he has splitted into several. I have a similar match but I was not able to make it work.

any help ?

Not, I still haven't found the solution. I have split each service into different VM and slow down the amount of messages per second, but this does not really solve the problem.

You can test the reg ex you are using on https://www.regex101.com/ and see the time and cycle it takes. If it is over 100 then your regex is too expensive and thus logstash will jam.

Still nothing?