We're trying to parse docker json file logger produces logs in the following format :
{"log":"{"msg":"Event started", "ts":"2016-08-30:01:02:03.000"}"}
If you have 1000s of docker containers all writing logs in a similar (but not exactly same) format, would it be sound to implement the unmarshalling of string-ified json and converting it to json object in file-beat? So that logstash configuration can be as simplified as possible
Or should it be something like this be implemented in logstash?
At some point we have to stop with the feature bloat of Filebeat. Parsing the first level of JSON is fine, but IMHO supporting nested JSON is too much.
filebeat is mostly a shipper. If you need more customized processing, logstash is the way to go.
Personally I'm no big fan of docker based logging, especially json file logger. Problems with docker logging are: by default json log file grows infintely (not bounded), log file is delete if container get's deleted (did you forward all logs yet?), json in json(?), how about multiline json in json(?), all logs are captured and forworded through workers in docker daemon itself (do not start hundreds of containers, or watch memory usage!).
Fortunately or unfortunately , depending on how you see it we already use docker based json-logging and changing that is going to require some big operational overhead Also, with regards to keeping filebeat feature light, i see your concern in the fact that you don't want to have logstash-like functionalities implemented in the shipper itself. However, since docker's json-file format, which wraps json as strings (something that I'm not extremely fond of), is fairly common, i wouldn't completely rule out the shipper owning some nested json parsing.
Given that we already have logs from ~200 different services being pushed into ES
implementing this json unwrapping in filebeat is a trade-off that we may have to take in order to avoid changing all the logstash filters.
Shall keep you posted on the approach i take for this.
btw. elasticsearch 5.0 gains support for ingest node (subset of logstash filters). While json filter is not yet implemented, you might want to watch this ticket.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.