Beats input blocked

Hi,

I'm experiencing problems with beats input in Logstash getting the following error:

{:timestamp=>"2016-04-15T10:18:32.126000+0200", :message=>"CircuitBreaker::Open", :name=>"Beats input", :level=>:warn}
{:timestamp=>"2016-04-15T10:18:32.127000+0200", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::OpenBreaker, :level=>:warn}

and after lots of errors, like the following, appear:

{:timestamp=>"2016-04-15T10:18:33.628000+0200", :message=>"Beats input: the pipeline is blocked, temporary refusing new connection.", :reconnect_backoff_sleep=>0.5, :level=>:warn}

I'm using filebeat: 1.2.1, logstash 2.3.1, logstash-input-beats 2.2.7 and Elasticsearch 2.0.0. Logstash config file:

input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "[%{HTTPDATE:msg_timestamp}] [%{WORD:process_name}] [%{WORD:log_level}] [%{GREEDYDATA:location}] %{GREEDYDATA:log_message}"}
}
date{
match => ["msg_timestamp", "dd/MMM/YYYY:HH:mm:ss Z", "dd/MMM/YYYY: H:mm:ss Z"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => [ "elasticsearch:9200" ]
index => "logs-%{+YYYY.MM.dd}"
}
}

I have seen the question opened many times and I have tried what is suggested: changing the congestion_threshold, the number of workers, .. Apparently the problem came from the grok and I have also tried to split into severals but still does not work.
Any suggestions/ideas? Is this problem planned to be solved in the future release?

Many thanks in advance

How do you know this? Have you tried running without the grok?
Are you monitoring ES to make sure it is not overloaded?

I was mistaken, I though it could be the grok because I was sending less data. I have just tried to send a file with more than 8000 events and since the first hundreds of events I'm getting the same error.

Changing the output from elasticsearch to a file seems to work. What could be the problem in ES? Should I change something in the configuration?

You need monitoring, otherwise you are wandering the in dark.
Do you have Marvel installed?

Hi,

I'm starting to use Marvel but I couldn't see anything weird going on. For the node, the percentage of cpu and jvm are really slow (less than 7%). For the index, at the moment of having the errors, I didn't see anything strange but I'm not sure what should be the normal behaviour expected.

Also, I have tried with the current version of each package (2.3 for ES and LS, Kibana 4.5, filebeat 2.1) but still doesn't work. However, using the 5-alpha seems to work taking out the date filter in Logstash. Could it be the problem?

Also, shipping directly all the file from filebeat to ES, no errors and all of them are collected in ES. So I'm starting to be lost... how is it possible to determine where the problem is? How to distinguish whether is logstash or elasticsearch is the culprit, in this case?

Thanks!

I have also seen that at the same moment I'm getting the errors from Logstash, in filebeat there is the following:

2016/04/25 12:12:44.357277 single.go:76: INFO Error publishing events (retrying): EOF
2016/04/25 12:12:44.357332 single.go:152: INFO send fail
2016/04/25 12:12:44.357346 single.go:159: INFO backoff retry: 1s
2016/04/25 12:14:15.413336 single.go:76: INFO Error publishing events (retrying): read tcp 137.138.161.19:43436->128.142.201.204:5044: i/o timeout

any solution ?

Unfortunately no. I'm trying to monitor the cpu usage of filebeat enabling the cpuprofile when launching and using also Marvel. I have also seen that there are some properties on filebeat that can be set to increment the performance: bulk_max_size, max_rentries, timeout but still i didn't find a good combination of all parameters.
Are you having the same issue?