Feature request on Filebeat

Dear Filebeat developers, is this the right forum to send my wish-list on Filebeat?

I would love to see Filebeat to include more features from logstash. e.g.,

  1. parse log message, e.g., extract the timestamp, log level from java4j log message, and add separate fields for them.
  2. drop required fields "@timestamp", "log".
    If I understand it correctly, the above functionality are achievable only through logstash and elastic search Ingest node..

The reasons I would like Filebeat have this functionality w/o having to using logstash are:
(1) Try to make what is deployed in edge servers as light weight as possible, it seems silly to have to deploy two services at the edge server to get simple log parsing before shipping out logs.
(2) from my limited experience, Filebeat code quality is better, more reliable than logstash. I encountered errors/exceptions in deploying/running logstash, but the experience w/ filebeat so far so good.

If these feature requests wont be granted any time soon, could someone point me to how to get around this for now? willing to write simple filebeat plugins myself

thanks!
yan

We try to keep Filebeat as lightweight and and simple as possible. That is the reason we leave out more complex features like processing / groking log files etc. Logstash is the tool built for exactly and includes all the features you need. In general you will have lots of filebeat instances sending to very few Logstash instances for the processing. So there is no need to deploy both on the edge node.

Not sure how you judge code quality / more reliable but I don't think this is the case. The two are very different tools. If you encounter errors / exceptions with Logstash, I recommend you to post the specific issues in the LS forum.

Dropping a field like @timestamp is not possible because that is the core thing that makes filebeat working. In time series (which logs normally are) each event must have a timestamp. Not sure why you want to drop it?

If you want to build the feature yourself as a plugin or similar, processors are built very similar to outputs in a pluggable way. Means you can follow this comment here to add your own processor / plugin: https://github.com/elastic/beats/pull/1525#issuecomment-217651768

1 Like

@ruflin Thanks for the pointer!
the previous discussion thread you pointed out here echoed why we would like to have log stash functionality in the edge server where Filebeat is deployed, w/o the overhead of deploying logstash at the same edge server..
We want to minimize the traffic transferred across the network, which is why we would like to drop any unnecessary field by having logstash parse/filter functionality.
https://github.com/elastic/beats/pull/1525#issuecomment-217651768

Re @timestamp, it is NOT the real timestamp we want. @timestamp is the time when filebeat processed the log entry in the log file; however, we need the timestamp when the log entry being produced(which is already logged in the log message). Therefore, @timestamp could be minutes/hours/or even days after the real timestamp when the log entry is generated.
Does this make sense to you? In summary, we wish to extract the timestamp from the log message, NOT the arbitrary timestamp when Filebeat processes the log file. we do NOT care about @timestamp generated by filebeat.

I prefer Filebeat over logstash for two reasons: (1) we encountered multiples errors/exceptions when using log stash, but no problem when using Filebeat (2) Filebeat developers are much more responsive to my questions posted in the forum. I did post my question/exception stack to log stash forum, but no response. I also filed bug report in log stash GitHub, only got response saying it is a duplicate of a previous bug, but NO ETA when to fix it.
I certainly expect much better response for a high quality/reliable software. maybe I expect too much:)))

In any case, I really appreciate you guys responding my questions, which is exactly why I like Filebeat much better than logstash:)

If you need the above functionality, I recommend to log directly as JSON. Then filebeat can support this use case.

Logstash is a much more complex product, exactly because of all the features it supports and has. We want to keep filebeat as simple as possible which includes pushing back on feature requests. We are aware that this a trade off for some users.

1 Like

sorry for refreshing this thread again, I understand that you guys intentionally keep Filebeat simple.
I am just wondering would it be possible or easy to plug in "awk"-like preprocessor to get basic parsing functionality.
It would be nice if we could run a simple awk script on the log input: As opposed to directly tailing a log file, first run an awk script on the log file, and send output to filebeat.
Is it easy to set this up with the current filebeat? Any pointer is appreciated!
yan

Re @timestamp, it is NOT the real timestamp we want. @timestamp is the time when filebeat processed the log entry in the log file; however, we need the timestamp when the log entry being produced(which is already logged in the log message). Therefore, @timestamp could be minutes/hours/or even days after the real timestamp when the log entry is generated.
Does this make sense to you? In summary, we wish to extract the timestamp from the log message, NOT the arbitrary timestamp when Filebeat processes the log file. we do NOT care about @timestamp generated by filebeat.

@filebeater You could send your data from awk to stdout and read in filebeat from stdin. Would that work?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.