I suspect the answer to this is that "filebeat can't do that", but figured I'd ask anyway in case I'm missing some feature.
What I'm trying to do is harvest docker container logs and redirect them to different files based off of some attribute.
For example, if I got the following log line:
{"log":"some output","stream":"stdout","attrs":{"jobid":"11"},"time":"2016-12-08T23:22:40.132270393Z"}
I would like to direct this to an output file named "jobid-11.log", based on the jobid in the above json.
The problems I see with this are:
Filebeat doesn't seem to be able to parse fields into any sort of variable; It's simply got limited parsing ability for purposes of filtering.
Filebeat doesn't seem to be have any concept of variables that can be used to change the output file name.
Even if #1 is true, I could still make this work if I could take something like the "source" output field and redirect to a different file based off of some substring of that (container id), but that would still require #2 to be true.
Is filebeat simply the wrong match for what I'm doing? I'm sure I could get logstash to do this, but I was looking for a lighter weight solution.
What's the use case for output to a file? You can't dynamically control the output filename, but you can set options for other outputs dynamically.
You can parse structured JSON logs in Filebeat. Then you can choose an ES index or a Kafka topic based on something in the event using a format string.
My use case is a little odd because our requirements don't allow us to ship these logs off the host that they ran on. I'm just moving them out of the docker container directory so that I can remove containers and still have access to the logs. I need to remove containers earlier than the logs because containers can take up a lot of disk space.
In any event, I'm guessing what I did above should work and either I'm getting the syntax wrong or it wasn't implemented for that specific output (which is described as being "for testing". I'm tempted to look at the code and suggest a fix, but will wait to see if anyone has further comments here first.
I looked into the source briefly, and I was correct.. It's not parsing out any tokens from the name string. Since the fileOutput object uses FileRotator to do the file IO, this would need to change:
Where rotator.Name would need to substitute any tokens it matched in the json data.
But this would get messy as it would need to open the (potentially differently named) file each time it did a writeLine instead of once in FileRotator Rotate(), and track the sizes differently, so this is not a trivial change.
OTOH, I got this working with logstash in less than an hour - but with logstash swallowing up 7% of the host's memory due to the JVM, this isn't a realistic option either.
I would be willing to dive in and "fix" this to parse out tokens in the filename if this was likely to get picked up by the project. I'll try opening an issue on the github and see where that goes.
"Format strings" and "string" fields are two different types of strings. The file output is using plain strings only for filenames on purpose, as the file output also implements log-rotation.
Adding support for filenames via "format strings" will complicate the file output I guess. Feel free to open a feature request.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.