Filebeat can't publish the data to logstash if there is a lot of traffic (~ 100-150 lines per second). I read a lot about discussions about the "Error publishing events" but nothing really matches my case. In the logstash log I can't find anything about these errors.
2016-11-23T15:04:15Z INFO Start sending events to output
2016-11-23T15:04:18Z INFO Error publishing events (retrying): write tcp 10.11.141.8:60533->10.11.141.29:5045: write: broken pipe
2016-11-23T15:05:14Z INFO Error publishing events (retrying): EOF
2016-11-23T15:06:26Z INFO Error publishing events (retrying): read tcp 10.11.141.8:60648->10.11.141.29:5045: read: connection reset by peer
or
2016-11-23T15:00:19Z INFO Registry file updated. 1 states written.
2016-11-23T15:00:23Z INFO Error publishing events (retrying): read tcp 10.11.141.8:59954->10.11.141.29:5045: i/o timeout
2016-11-23T15:00:25Z INFO Error publishing events (retrying): EOF
2016-11-23T15:01:02Z INFO Error publishing events (retrying): EOF
2016-11-23T15:01:52Z INFO Error publishing events (retrying): write tcp 10.11.141.8:60137->10.11.141.29:5045: write: broken pipe
2016-11-23T15:02:07Z INFO Error publishing events (retrying): EOF
2016-11-23T15:02:24Z INFO Events sent: 2048
or
2016-11-06T09:20:12Z INFO Registry file updated. 37 states written.
2016-11-06T09:20:39Z INFO Error publishing events (retrying): EOF
2016-11-06T09:20:39Z INFO Events sent: 1
Those Filebeat log messages most likely indicate you are hitting the congestion threshold in Logstash and Logstash is dropping the connection.
You could do some benchmarking to find the bottleneck in your system. Test your LS pipeline with stdin/stdout and measure the throughput (stdout { codec => dots } works nice if you pipe the output through pv -War > /dev/null. It will log the events per second rate).
BTW that logstash-beats-input you have defined on port 5045 looks incorrect because you have it setup with the json codec. But that should be unrelated to this problem.
Those throughput numbers (with outputs enabled) look too low to sustain a 100-150 lines per second rate. Have you run through the Performance Troubleshooting Guide? When running the throughput test it would be good to check if your CPU is saturated. If not possibly increase the number of workers (as mentioned in that guide).
It may be beneficial for you to should post in the Logstash topic with your LS config and the test setup and results you got. There you may get a bit more visibility on the topic of tuning from the Logstash experts.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.