High rate Remote Syslog into filebeat


#1

I had a setup working, using logstash with udp input and rabbitmq output, to consume a high rate of remote syslog messages and publish it into elastic search (with another logstash instance using rabbitmq as input, and output to elastic search). I found that java was using 3 cores at 100% to handle the load (though it was handling it).

I am trying to convert to using filebeat as the transport now, instead of rabbitmq, with the hope of filebeat being able to work with much lower performance hit.

So before I had:

syslog src -> udp 514 -> logstash (local) -> rabbitmq (AWS) -> logstash (AWS) -> ES (AWS)

And I am now trying to move to:

syslog src -> udp 514 -> rsyslog -> /var/log/file.log -> filebeat -> logstash (AWS) -> ES (AWS)

but I am finding that most messages are being dropped (likely due to the unnecessary file io). I also tried a pipe via mkfifo, but IIRC filebeat didn't load.

So the question is, is there any way for filebeat (or packetbeat if it can pull the message for that matter) to listen directly on port 514? This would then allow me to publish the syslog data w/o the file i/o overhead.


(Mark Walkom) #2

Filebeat will only ever read files. Packetbeat probably won't work as it's application level.

Have you tried redis instead of MQ?


(ruflin) #3

Filebeat currently does not support mkfifo. It supports stdin in case this helps.

Where in the above chain are the packages dropped? Why don't you install filebeat directly on the machine of "syslog src"?


#4

I have not tried redis, is there reason to believe that the rabbitmq output from logstash is what is consuming the most CPU in java?


#5

@ruflin

I can't install filebeat directly on the src as it's a 3rd party embedded system, I can only configure it to remote syslog to an arbitrary ip/port.

I believe the messages are being dropped between rsyslog and the kernel as rsyslog can't consume fast enough with trying to write to the filesystem. I will try running an application to print the messages to stdout and pipe that to filebeat consuming from stdin, that should eliminate any filesystem io bottlenecks.


#6

I wrote a small go program to listen to port 514/udp and just print the received bytes to stdout. I run that piping it to filebeat configured to listen to stdin and forward to logstash. The performance is fantastic, CPU sits around 50% of one core, which is great compared to my first solution.

The only hiccup was filebeat puts the contents from stdin into a field called "text" instead of "message", so I had to write a mutate filter in logstash before my other filters to get the expected behavior:

if [source] == "-" {
  mutate {
    remove_field => ["message"]
  }
  mutate {
    rename => { "text" => "message" }
  }
}

Thanks for your help/suggestions!


(Tudor Golubenco) #7

Nice, it might make sense to transform this into a "Syslogbeat" so you only have to run one process. What do you think?


#8

I think any of the existing logstash input filters could have a case made for them to have a corresponding "beat" implementation, if running logstash on the remote machine was too resource intensive for an individual's use case.

Ultimately, lumberjack/logstash-forwarder was born out of the same need, so really the question becomes what is the desired vision for elastic's recommended deployment model? Logstash on all the "source" nodes, with "beats" to replace them iff performance is an issue? Or recommend always deploy an appropriate "beat" and only deploy logstash on a central well provisioned server?

If the later, then from an architectural perspective, we are basically recommending re-writing all the input filters of logstash in go instead of java, and breaking them out into separate executables (which someone is going to suggest become a single executable again :slight_smile: )

Anyways, back to your question, in the short term if anyone else has a similar need, it makes a lot of sense IMO to make a syslogbeat. I'll try to get some spare time to look at contributing back to help :smile:


(ruflin) #9

@matt.koivisto Which version of filebeat are you using? The reason I ask because about 7 days ago we changed from text to message and beat4 should have this change already inside: https://github.com/elastic/filebeat/blob/89410957be0163fe2e999cab0efffdeb6ab926c5/input/file.go#L56


(ruflin) #10

@matt.koivisto About your architecture question: Beats will not replace Logstash on all source nodes because Logstash has lots of additional capabilities. But in some cases it will definitively as described in your second option, especially as soon as we support filtering and multiline. Beat should be used if a lightweight solution is need to "just" forward the data without processing it. The goal is to keep beats as lightweight as possible so also the resource usage stays low. I'm quite sure you are right and lots if different "input" beats will be also created by the community. As the beats project is still a very young project it is still open to see where this will lead.


(system) #11