I had a setup working, using logstash with udp input and rabbitmq output, to consume a high rate of remote syslog messages and publish it into elastic search (with another logstash instance using rabbitmq as input, and output to elastic search). I found that java was using 3 cores at 100% to handle the load (though it was handling it).
I am trying to convert to using filebeat as the transport now, instead of rabbitmq, with the hope of filebeat being able to work with much lower performance hit.
but I am finding that most messages are being dropped (likely due to the unnecessary file io). I also tried a pipe via mkfifo, but IIRC filebeat didn't load.
So the question is, is there any way for filebeat (or packetbeat if it can pull the message for that matter) to listen directly on port 514? This would then allow me to publish the syslog data w/o the file i/o overhead.
I can't install filebeat directly on the src as it's a 3rd party embedded system, I can only configure it to remote syslog to an arbitrary ip/port.
I believe the messages are being dropped between rsyslog and the kernel as rsyslog can't consume fast enough with trying to write to the filesystem. I will try running an application to print the messages to stdout and pipe that to filebeat consuming from stdin, that should eliminate any filesystem io bottlenecks.
I wrote a small go program to listen to port 514/udp and just print the received bytes to stdout. I run that piping it to filebeat configured to listen to stdin and forward to logstash. The performance is fantastic, CPU sits around 50% of one core, which is great compared to my first solution.
The only hiccup was filebeat puts the contents from stdin into a field called "text" instead of "message", so I had to write a mutate filter in logstash before my other filters to get the expected behavior:
I think any of the existing logstash input filters could have a case made for them to have a corresponding "beat" implementation, if running logstash on the remote machine was too resource intensive for an individual's use case.
Ultimately, lumberjack/logstash-forwarder was born out of the same need, so really the question becomes what is the desired vision for elastic's recommended deployment model? Logstash on all the "source" nodes, with "beats" to replace them iff performance is an issue? Or recommend always deploy an appropriate "beat" and only deploy logstash on a central well provisioned server?
If the later, then from an architectural perspective, we are basically recommending re-writing all the input filters of logstash in go instead of java, and breaking them out into separate executables (which someone is going to suggest become a single executable again )
Anyways, back to your question, in the short term if anyone else has a similar need, it makes a lot of sense IMO to make a syslogbeat. I'll try to get some spare time to look at contributing back to help
@matt.koivisto About your architecture question: Beats will not replace Logstash on all source nodes because Logstash has lots of additional capabilities. But in some cases it will definitively as described in your second option, especially as soon as we support filtering and multiline. Beat should be used if a lightweight solution is need to "just" forward the data without processing it. The goal is to keep beats as lightweight as possible so also the resource usage stays low. I'm quite sure you are right and lots if different "input" beats will be also created by the community. As the beats project is still a very young project it is still open to see where this will lead.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.