Fairly new to Elastic Stack, so am looking for some high level design advice. Regarding syslog input, my requirement is for network devices (Cisco, Juniper, Pulse Secure, etc) to syslog to Logstash, with a filtered set of these logs then output to ES. From reading around it seem there are 3 basic options:
Use Logstash syslog input plugin with additional grok patterns for non-RFC3164 formats.
Use Logstash UDP input plugin with additional grok patterns for non-RFC3164 formats.
Build Rsyslog server (on Logstash server), with a rsyslog JSON template for formatting the syslog data. Configure network devices to send logs to Rsyslog server on UDP514. Configure Rsyslog to forward to Logstash (on a non-standard UDP port), and configure Logstash to receive JSON codec from Rsyslog.
Are these all sensible options? If so, is option 3 more likely to have performance concerns? For options 1 and 2, how difficult and reliable are the creation of the grok patterns? Any general thoughts and advice much appreciated.
What we do is have our systems send direct to Logstash, where all the parsing/filtering/reformatting is done before log data are passed on. Putting an rsyslog server between the systems and Logstash seems like it would just add complexity without providing much in the way of benefits unless rsyslog already has modules that can parse and JSON-ize the formats your vendors use.
Filtering log formats turns out to be a bit of an art because of the data flow paradigm that Logstash uses. You need to be aware of the available input plugins and the filter plugins, then figure out how you want to combine those to create a working pipeline.
FWIW note that you can also use TCP for the inbound data, and even use TLS to protect confidentiality/integrity plus mutual client and server certificates for authentication. Also Logstash can't seem to bind to privileged ports like 514 if it's running as an unprivileged user (which it almost certainly should), so you'll probably need to reconfigure your network elements to use a non-standard port regardless.
The other thing we've done, that you might want to look into, is to create a separate pipeline for each type of log (so in your case, maybe a Cisco pipeline, a Juniper pipeline, etc, etc). This seems to have a couple of advantages: (1) it keeps the individual pipeline files small and focused on just that vendor, (2) each pipeline runs independently, so a problem with the one of them won't block the others. What we did was assign a different non-standard port to the input for each pipeline (say 10514 for Cisco, 10515 for Juniper), and that keeps everything neatly separated. I don't know if this is the best way (I only started working with Logstash a couple weeks ago), but it seems to work and the separation has been helpful while debugging. We created a "testing" pipeline that is used when we want to try out changes or start parsing logs from a new vendor without impacting the others. It doesn't even have to send data to Elasticsearch. You can watch check the results in real time or write them to a file to see if you're getting what you want.
To be honest, I'm not sure, either! It seems like the syslog input plugin gives you some protocol-specific options, while the "raw" tcp/udp plugins give you more flexibility in tailoring the network connection. So for example, with the tcp plugin you can configure TLS, but I don't see an option for that in the syslog plugin.
Thanks for that pointer to the rsyslog article! I guess I still don't see the role of the rsyslog server, but maybe some manipulations are easier there than they are in Logstash? So far, at least, we've been able to do all our parsing and JSON conversions in Logstash with things like the kv filter plugin.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.