hi
we got a datalake from syslog datasets and the devices/endpoints are logging in format of /data/<collect_port>/<ip_address>/<severity>.<facility>.log
eg /data/514/172.128.4.2/auth.info.log
And within the datasets, the data pattern is rfc5424 format
<38>1 2020-07-15T18:40:49+01:00 client.hostname sshd 30077 - - user root login class [preauth]
Obviously the grok expressions in default filebeat cannot parse this
It seems there is something in progress https://github.com/elastic/beats/pull/15467. Maybe you can try out this branch otherwise I would suggest you to just ship the logs to a Logstash server and process them with a GROK pattern.
When you say ship to logstash, by which means? i.e. How to get into logstash? Do you mean by using filebeat and consider everything as "raw message" into logstash?
what's the best method to ship to logstash?
ok. But that's still the chicken-egg problem whereby the filebeat would need basic parsing.
Is there an option to send a raw field output (i.e uncooked data) directly to logstash?
as though filebeat just refect exact/pure event from the data lake to logstash without parsing it at all?
Yes, Filebeat will send the messages to Logstash without any pre processing. In this Filebeat will be used as a basic collector and Logstash as the aggregator which will be performing the analysis of the logs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.