Grok in filebeat?

I would love to try out filebeat as a replacement for my current use of LogStash.
I like the idea of running a Go program instead of a JVM.

Replacing my use of the "file" input plugin to use filebeat would be easy for "tailing" the access logs.
However, I actually read a fair number of other inputs and use grok to filter out the noise as close to the data source as possible.

Further, I plan to have my LogStash output go over Kafka instead of going directly into ES.

On the face of it, Go should be able to do "grok" and "kafka" just as easily as Java (LogStash) , but apparently that is not (currently) the case.

I guess the question is what is the correct place to draw the line between LogStash and Filebeat?

From my experience, if you want to process the log lines a bit, like storing in variables or applying any filter, you will always need to use logstash.

In ELK 5.0.0 there is something called Ingest Node in ElasticSearch wich allows you to do some pre-processing before the data gets indexed. But i havent tried it so i dont know exactly what you can and cannot do with it.

EDIT: Ingest Nodes allows you to do GROK filters

I think I didn't communicate my situation clearly.
I'm actually using "not_analyzed" for all my data.
I think ingest nodes are a way to do processing before indexing, but I'm not really doing indexing.
What I want is a way to do processing before transferring data from a node to the processing pipeline.

I'm afraid that asking for grok in filebeat is essentially the road to asking for LogStash to be ported from Java to Go -- something unlikely to happen even though Go is more efficient at runtime than Java is.

@iamthealex Beats are lightweight shippers and we try to keep it that way. Beats should do everything which is critical to be done on the edge node. For example filtering is important to reduce the number of events that are sent to reduce network traffic. If we start adding more and more processing features like grok to beats, beats will not be lightweight anymore.

We are aware that some people would like to do the groking on the client side. Our current recommendation here is to use LS on the edge nodes.

1 Like

If you were to design LogStash ground up today, I imagine that using Go would be a better choice since it supports regular expressions and doesn't require Ruby or a JVM. I understand your hesitation to have both a JVM/ruby implementation (LogStash) and a competing faster but less battle-tested Go implementation (FileBeat/or-similar).

My specific example of wanting to use LogStash style processing before shipping "pointless" data is shipping /var/log/wtmp data.

  1. I have to use LogStash input command plugin to use Linux "last" to decode the file.
  2. I then use split to get each line as a separate event
  3. I then filter out any event that was already reported on a previous day as being "over".

This filtering results in a huge reduction in storage.
The savings are so immense that I cannot move to filebeat until I can do something similar with filebeat.

I would really love to have the shipper be lighter weight if at all possible.

That is an interesting question. My point of view here is, that assuming there would be no Logstash, Beats would probably end up being more and more similar to Logstash with alls its pros and cons, meaning it would try to be a heavy server component and a lightweight shipper at the same time. I see this independent of the language it is written in.

The beauty is that there is Logstash, which beats can use for processing. So there is not need for beats to add all complexity of Logstash and Logstash is not required anymore on the edge nodes.

I think your case with the wtmp file which does not contain a log entry per line should be either handled with a specific beat that understands the format or even a separate prospector in filebeat. Both would allow you do the filtering and would potentially already structure the events.

I suppose that I could use FileBeat on the edge sending (large) /var/log/wtmp to a cluster local LogStash concentrator,
then use LogStash concentrator to expand the wtmp and drop the "old news" before forwarding on the data.

This scenario would give me the lightweight Beat on the edge,
filter out pointless data on a local concentrator node where the CPU hit won't matter so much
saving me from sending large swaths of data across to my (not-local) data sea..


I understand this topic has come up multiple times. In order to simplify deployment, it would help if filebeat can do parsing as well and send data to elastic directly. With logstash, we will need additional logstash deployment and its maintenance. It would make it really simple if we keep the deployment to
a) filebeat parsing + elastic indexing
b) filebeat forwarding + elastic - ingest node.

Having different deployment models from which customer can choose will help as they can choose what is best for them


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.