Filebeat with grok,javascript and AVRO codec

Lieven_Merckx · September 10, 2018, 6:22am

In our company we unified our logging system in a single AVRO schema pushed through Kafka. As we needed a logshipper in combination with our ELK stack, we looked at filebeat.
As each application decided in the past on their own format, the centralised approach made it necessary to do remapping of the data. As we also didn't want to handle that in a single central team, we wanted re-mapping and pattern matching distributed. So we extended filebeat with 2 processors : grok and an embedded javascript engine. While also adding a new codec for AVRO schema.
The result : https://github.com/vortex314/beats
As it's contrary to the lightweightshipper approach of Elastic, I was wondering how to continue next.
Make this a contribution ? Or make it a separate fork ?

pierhugues · September 10, 2018, 5:29pm

Hello @Lieven_Merckx I think the best would be to open an issue on Github so we can discuss there, we have been reluctant to include a grok processor inside beats for performance reasons. Instead we have added dissect which is a much faster way tokenize string, but not as complete as grok.

The javascript engine is something I haven't think about before, what kind of manipulation you are doing that require complex scripting?

Lieven_Merckx · September 11, 2018, 8:25pm

Hi @pierhugues , these features were driven by the way we try to enable our development squads in our company. As we offer a central hosting service for Elastic, we have also the need to correlate the log data across the whole chain ( > 100 applications ) so we defined a common logging datamodel. However centrally you cannot handle the legacy log transformation as we never standardized this before. We offer a logging pipeline through kafka and impose this datamodel as a requirement ( a lot of optional fields of course ) . So instead to do all the log transformation centrally , we distribute this responsability but we enable them with this filebeat version. Logstash is just to heavy to run distributed.
Javascript is there if it's really difficult to just use regexp grok to extract the data, you have no idea the non-structured logs we find. We have also seen that this comes in handy ( Extrahop does this ) when you need to map technical data to some business meaning.
Elastic is a great product but it requires that you think upfront on the datamodel and how you will use it.
As we have also a commercial contract with Elastic I was wondering what is the best way to proceed, via a github issue or a ticket in the Elastic customer system ?

system · October 9, 2018, 8:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok in filebeat? Beats filebeat	8	31430	December 27, 2016
【filebeat】Did Filebeat consider adding a Grok processor in the processors module? Beats filebeat	1	174	November 6, 2023
Beats Native Grok Processor Beats filebeat	1	448	July 19, 2023
Filebeat to Elasticsearch grok example Beats filebeat	2	3661	May 26, 2020
Can I write grok expression to enrich log files in FileBeat before sending to Logstash / elastic search Beats	4	1124	July 5, 2017

Filebeat with grok,javascript and AVRO codec

Related topics