Collecting from syslog data-lake

hi
we got a datalake from syslog datasets and the devices/endpoints are logging in format of
/data/<collect_port>/<ip_address>/<severity>.<facility>.log
eg
/data/514/172.128.4.2/auth.info.log

And within the datasets, the data pattern is rfc5424 format

<38>1 2020-07-15T18:40:49+01:00 client.hostname sshd 30077 - - user root login class [preauth]

Obviously the grok expressions in default filebeat cannot parse this

How to make

  • the host.ip from the name of the file
  • the host.name from the client.hostname parameter
  • Any ready-made pattern for rfc5424 template?

Hi!

It seems there is something in progress https://github.com/elastic/beats/pull/15467. Maybe you can try out this branch otherwise I would suggest you to just ship the logs to a Logstash server and process them with a GROK pattern.

@ChrsMark
Thanks for the info.

When you say ship to logstash, by which means? i.e. How to get into logstash? Do you mean by using filebeat and consider everything as "raw message" into logstash?
what's the best method to ship to logstash?

Yes shipping from Filebeat to Logstash: https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html

And processing the logs at Logstash's side with GROK patterns: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

ok. But that's still the chicken-egg problem whereby the filebeat would need basic parsing.
Is there an option to send a raw field output (i.e uncooked data) directly to logstash?
as though filebeat just refect exact/pure event from the data lake to logstash without parsing it at all?

Yes, Filebeat will send the messages to Logstash without any pre processing. In this Filebeat will be used as a basic collector and Logstash as the aggregator which will be performing the analysis of the logs.

C.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.