Filebeat with grok,javascript and AVRO codec


(Lieven Merckx) #1

In our company we unified our logging system in a single AVRO schema pushed through Kafka. As we needed a logshipper in combination with our ELK stack, we looked at filebeat.
As each application decided in the past on their own format, the centralised approach made it necessary to do remapping of the data. As we also didn't want to handle that in a single central team, we wanted re-mapping and pattern matching distributed. So we extended filebeat with 2 processors : grok and an embedded javascript engine. While also adding a new codec for AVRO schema.
The result : https://github.com/vortex314/beats
As it's contrary to the lightweightshipper approach of Elastic, I was wondering how to continue next.
Make this a contribution ? Or make it a separate fork ?


(Pier-Hugues Pellerin) #2

Hello @Lieven_Merckx I think the best would be to open an issue on Github so we can discuss there, we have been reluctant to include a grok processor inside beats for performance reasons. Instead we have added dissect which is a much faster way tokenize string, but not as complete as grok.

The javascript engine is something I haven't think about before, what kind of manipulation you are doing that require complex scripting?


(Lieven Merckx) #3

Hi @pierhugues , these features were driven by the way we try to enable our development squads in our company. As we offer a central hosting service for Elastic, we have also the need to correlate the log data across the whole chain ( > 100 applications ) so we defined a common logging datamodel. However centrally you cannot handle the legacy log transformation as we never standardized this before. We offer a logging pipeline through kafka and impose this datamodel as a requirement ( a lot of optional fields of course ) . So instead to do all the log transformation centrally , we distribute this responsability but we enable them with this filebeat version. Logstash is just to heavy to run distributed.
Javascript is there if it's really difficult to just use regexp grok to extract the data, you have no idea the non-structured logs we find. We have also seen that this comes in handy ( Extrahop does this ) when you need to map technical data to some business meaning.
Elastic is a great product but it requires that you think upfront on the datamodel and how you will use it.
As we have also a commercial contract with Elastic I was wondering what is the best way to proceed, via a github issue or a ticket in the Elastic customer system ?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.