File beat fails to send events to logstash

File beat is not sending events to logstash after a while. Initially when i setup the filebeat (three instances) it was successfully sending the events to logstash after a couple of hours it thrown an error message "send fails".

my production structure as below

Filebeat (1.2) -------------------------> Logstash (2.0) -----------------------> elasticserach (AWS service)

If i enable only one filebeat client this works perfectly. When i enable multiple filebeats then i got the above error.

I used multiline config in file beat and sending 3 different document type to logstash. i am not using ssl just sending message from filebeat to logstash via tcp 5043.

Your help would be appreciated.

Thanks in advance.

Providing the entire error would he helpful.

Hi Mark,

Below are the error messages from filebeat and logstash

Filebeat
2016-04-27T06:33:22.74-04:00 INFO backoff retry: 1m0s
2016-04-27T06:33:22.74-04:00 INFO Error publishing events (retrying): EOF
2016-04-27T06:33:22.74-04:00 INFO send fail

Logstash

{:timestamp=>"2016-04-27T06:33:22.743000+0000", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}

{:timestamp=>"2016-04-27T06:33:22.758000+0000", :message=>"Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

is elasticsearch even running? Is elasticsearch properly scaled? The logstash error indicates logstash output and or filtering being way to slow. Having reached a retry of 1m in filebeat indicates some bad condition being active for quite some time.

Hi steffen,

Thanks for the reply. yes elastic search is running.
Regarding scaling, please guide me as i have aws elastic search and i am new to aws as well elasticsearch.
I would like to know more on your statement "Having reached a retry of 1m in filebeat indicates some bad condition being active for quite some time." what could be the bad conditions?

which elasticsearch aws service are you using? Elastic Cloud?

filebeat uses some exponential backoff on error. first 1 second, next 2 seconds, 4 seconds, 8 seconds, 16 seconds up to 1 minute. After successfully publishing events the wait timer is set back to 1 second. That is the condition of logstash not being able to send data to elasticsearch in time seems to hold for quite some time. It's most likely a problem with elasticsearch or logstash output configuration, but not beats.

Check your logstash logs for any other errors. Have you checked with netstat if logstash has active connections with elasticsearch. Any documents indexed in elasticsearch? What's the indexing rate?

Do you have Marvel installed? It'll give you insight into what your cluster is doing/

yes. AWS elastic cloud service.

  1. No errors recorded in logstash error file.
  2. Yes. Logstash has active connection.
  3. Yes. total_docs:570541144
  4. index_time_in_millis:15385738

Marvel is not installed.

does problem persist or happens only from time to time.

Monitoring/Marvel is part of Elastic Cloud I think.

The main problem is logstash being slow or slowed down by output. Checking the logstash docs there is a config to change the [timeout within logsthash] (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html#plugins-inputs-beats-congestion_threshold). The default is 5 seconds. Try setting congestion_threshold => 600 in your logstash config (10 minutes).

Thanks Steffen . I will change the logstash config and test it.
Do i need to change any config in elastic search? eg,. no.of node or shards

No idea about your elasticsearch setup. What's exactly the name of your cluster?

Hi steffen,

i made the changes for congestion level and it was running fine last two days. suddenly logstash log file size increased and the service got stopped. When i investigate the log rotate is not happening when size increases. Also though logstash parse the log message successfully i could not see the messages in kibana.

kibana is reading data from elasticsearch. I guess elasticsearch has become unavailable, letting logstash logs go bonkers. What's your logs saying?

Hi stephen,

As logstash log file size increases rapidly, i disable the logstash log under /etc/init.d/logstash file. I think logrotate is not happening based on size, it happens everyday i believe. As i disable the log, my logstash running smoothly. But i am not able to view the data in kibana. As you said i will check the elasticsearch and get back to you.

why not configure size in you log-rotate config. See manpage

Hi stephen,

I checked the elastic search, there is no disc space available and hence not able to view data in kibana. Thanks for your guidance. Actually logstash writes all the out put log file, as the elasticsearch disc space is not available that is the root cause. Now all things are working fine. I will monitor for couple days and share you the status. Thanks for your patience and timely support.