and im expiriencing a lot of missing packages from filebeat.
The filebeat logs report back
2016-06-09T10:34:00+01:00 INFO backoff retry: 1m0s
2016-06-09T10:35:05+01:00 INFO Error publishing events (retrying): EOF
2016-06-09T10:35:05+01:00 INFO send fail
2016-06-09T10:35:05+01:00 INFO backoff retry: 1m0s
2016-06-09T10:36:15+01:00 INFO Error publishing events (retrying): EOF
2016-06-09T10:36:15+01:00 INFO send fail
2016-06-09T10:36:15+01:00 INFO backoff retry: 1m0s
2016-06-09T10:37:20+01:00 INFO Error publishing events (retrying): EOF
2016-06-09T10:37:20+01:00 INFO send fail
2016-06-09T10:37:20+01:00 INFO backoff retry: 1m0s
and the logstash reports back
{:timestamp=>"2016-06-09T10:35:05.926000+0100", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
{:timestamp=>"2016-06-09T10:36:15.387000+0100", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}
{:timestamp=>"2016-06-09T10:36:15.388000+0100", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
{:timestamp=>"2016-06-09T10:37:20.406000+0100", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}
{:timestamp=>"2016-06-09T10:37:20.407000+0100", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
My logstash instance occupies 257 MB of RAM (average) and the filebeat is at 20.8 MB.
My elasticsearch has about 1.1M events per hour and it has 6 beefed up nodes with the following stats
The beats input plugin uses a circuit breaker closing connections if the input plugin can not push events to the pipeline. The default timeout of the circuit breaker is 5 seconds. In addition beats might break connections and resend if logstash is unresponsive for N seconds (default = 30 seconds, I think).
(optional) bulk_max_size in filebeat. Reducing bulk size has little effect in logstash, but ACKs might be returned earlier from logstash reducing the chance of timeouts in filebeat.
I'd recommend to set congestion_threshold to X (very large number) years, in order to disable the circuit breaker + set timeout in filebeat to some higher acceptable value. e.g. at least twice the max timeout in logstash outputs times per event processing overhead, given the problem is not slow filters (e.g. 120 seconds). Monitor filebeat logs (info level) or logstash regarding reconnects and update timeout in beats accordingly.
The root cause is most likely due to output not being very responsive/slow or some logstash filter stalls/slowdowns (e.g. inefficient grok filter).
I have no idea about logstash (grok) filter optimization. Did you try to measure throughput in logstash itself? Maybe someone in logstash forum can help in case increasing congestion_threshold doesn't work.
I've run overnight with congestion_threshold => 99999999 in logstash and timeout: 320 in filebeat and it was smooth.
I didnt got any circuit breaker and the events seem to be in the elasticsearch.
So that means the elastic clusters are all right but there is something to be done with the logstash ? Or am i completely wrong ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.