Rsyslog+ELK Integration

Hi All,

I am new to ELK stack. I am now trying to implement a centralized log server for one of the project in my organization. This implementation expectations are,

  1. Ship logs (system logs, webserver logs, firewall events, DB audit logs, Application logs etc ) from all client machines/devices to a centralized log server

  2. Store this logs in the central server with regular archiving.

  3. Process the logs using ELK stack and make some useful dashboards.

I have done some experiments with Rsyslog, and the centralized logging part is done, I can get all the events in separate log files in the centralized server now in the way I want.

And for archiving the logs, I am considering logrotate in the location of logs which will perform the rotation of logs on a weekly/monthly basis.

When considering the integration with ELK, a few questions comes to my mind,

  1. Rsyslog itself has a mechanism to output the log files in a json format, so if I set json formatter in the rsyslog output, what is the point in using Logstash ? Or do I need to go for logstash filters instead of using Rsyslog json component ? Which is more powerful here ?

  2. If I am using logstash filters can I integrate more than one filter to the rsyslog output? Or what is the best way to filter in logstash if the log output of rsyslog contains messages from both /var/log/messages and apache_access_logs mixed

  3. I am running all these components in ELK+Rsyslog in one server, so for log archival if I am in the right direction ? Or there is another method exists ?

Please advise.

Thanks,
Bhuvanesh

This largely depends on whether the data needs further parsing. Rsyslog does a good job of tokenizing syslog data into fields, but what if you need to parse those fields further? This is where outputting to Logstash can play an important role.

Of course! Using conditionals, you can apply as many filters as needed, to as many different log formats as you may have.

This approach will not scale very far. If the server is currently at an acceptable performance level for data ingest, you won't need more servers for that part. But you will likely need to expand your Elasticsearch cluster as you add more data. Try to have 1 index per data type per time interval, e.g. keep apache logs in their own index, which is created daily, weekly, or monthly. This is to keep mappings under control, and allow for easier management of shard counts across all data nodes in your Elasticsearch cluster. You may only need to keep syslog data for 2 weeks, but Apache data for 3 months. This way it is easy to keep only the data that matters.

Hi Aaron,

Thanks for the advise! I think I need a bit more help.

I have created three grok patterns.

APACHE_ACCESS %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: %{IPORHOST:clientip} (?:-|%{USER:ident}) (?:-|%{USER:auth}) [%{HTTPDATE:access_timestamp}] "(?:%{WORD:request_type} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:-|%{NUMBER:bytes}) "%{NOTSPACE:request_uri}" "%{GREEDYDATA:User_agent}"

AUDIT %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: type=%{WORD:audit_type} msg=audit(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}): user pid=%{NUMBER:audit_pid} uid=%{NUMBER:audit_uid} auid=%{NUMBER:audit_audid} ses=%{NUMBER:audit_ses} msg=%{GREEDYDATA:audit_message}

ACTIVITY %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:logsource} %{USER:ssh_user}: %{USER:escalation} %{SYSLOGPROG} %{IPORHOST:clientip} %{GREEDYDATA:activity_message}

And the logs I am expecting from the rsyslog forwarder server is like this order : -

================================
Aug 30 18:33:04 syslogclient01 root: root User-Activity 192.168.1.104 [59879]: touch test [0]

Aug 30 18:33:44 syslogclient01 tag_audit_log: type=CRYPTO_KEY_USER msg=audit(1472562224.404:56190): user pid=60001 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=session fp=? direction=both spid=60002 suid=74 rport=7021 laddr=172.20.20.151 lport=22 exe="/usr/sbin/sshd" hostname=? addr=172.20.20.152 terminal=? res=success'

Aug 30 18:33:12 syslogclient01 root: root User-Activity 192.168.1.104 [59974]: less /var/log/cron [0]

Aug 30 15:08:40 syslogclient01 apache-access: 192.168.1.104 - - [30/Aug/2016:15:08:34 +0530] "GET /_static/classic.css HTTP/1.1" 304 - "http://rsyslogdoc.com/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41"

Aug 30 15:08:40 syslogclient01 apache-access: 192.168.1.104 - - [30/Aug/2016:15:08:34 +0530] "GET /_static/pygments.css HTTP/1.1" 304 - "http://rsyslogdoc.com/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41"

Aug 30 18:33:43 syslogclient01 root: root User-Activity 192.168.1.104 [59974]: ps ax | grep tail [0]

Aug 30 18:33:44 syslogclient01 tag_audit_log: type=CRYPTO_KEY_USER msg=audit(1472562224.405:56192): user pid=60001 uid=0 auid=4294967295 ses=4294967295 msg='op=destroy kind=server fp=9d:ca:03:95:28:8e:a2:e3:f0:e8:70:fc:4e:b9:11:01 direction=? spid=60001 suid=0 exe="/usr/sbin/sshd" hostname=? addr=172.20.20.152 terminal=? res=success'

Aug 30 18:33:44 syslogclient01 tag_audit_log: type=USER_LOGIN msg=audit(1472562224.405:56193): user pid=60001 uid=0 auid=4294967295 ses=4294967295 msg='op=login acct="root" exe="/usr/sbin/sshd" hostname=? addr=172.20.20.152 terminal=ssh res=failed'

I am looking to get the above grok patterns applied to the incoming log based on the condition %{SYSLOGPROG} , like if %{SYSLOGPROG} == apache-access , then apply pattern APACHE_ACCESS , if it is User-activity then apply ACTIVITY like that.

Is this something feasible ? I tried in google, but no examples worked for me.

I hope this is the only place I can get some help!

Expecting your advise.

Thanks & Regards,
Bhuvanesh

You should start a new thread which is specific to your grok question. It's out of the scope of the original question here.

Also, this is why I do not send anything but syslog through syslog, or rsyslog, or any other variant. Though the following link is 3 years old, the configuration details on how to get Apache to log in JSON are still valid: http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/

NOTE: Do NOT use the Logstash configuration from the above link, as it is outdated, 3 years worth. The above link is only for a reference to getting Apache to log in JSON.

In all honesty, I'd get Apache to log in JSON, and then have filebeat send it to Logstash for further parsing. Centralizing all logs is a worthy goal, but in my opinion, syslog should only be storing syslog, not apache or any other format.