I have machine with many docker containers and I want to find a way to how centralised the containers logs into one beats app which will be installed on each machine (with multiple containers).
So my question is how can I achieve that beats will get every log from each docker container.
There are multiple solutions for getting logs from docker containers. Just pushing stdout to some app via stdin might not be the most robust one.
Use docker syslog logging driver with logging tags to add e.g. container-id/name to messages and forward to some syslog daemon or directly to logstash (has some syslog input + use grok for parsing). Alternatively have syslog daemon write to files and use e.g. filebeat to forward logs (reduces chance of loosing log lines).
Use log shipper (filebeat) to forward json logs from docker containers (see json log driver for proper configuration) + logstash to parse log lines.
There are many more options for shipping your logs. E.g. have shipper installed with image, have shipper on host or itself running in container...
The problem is:
1- If the local syslog dies, docker logs are lost... if the remove logstash dies, logs are lost
2- json logs also have the same problem
we are using docker fluentd log driver to send the logs to a fluentd and then to a kafka, this setup did not lost logs if any of the pieces crashed and later restarted... but this is a little pain, as it used yet another tool (fluend) and it docker fluentd driver miss adding proper tags to the machine/logs and not multiline support, so it is a lot harder to parse the logs later in logstash
What is needed is a dockerlogsbeat driver, something that grabs the docker output (json or text) and adds the needed tags and forward then to logstash/kafka/elastic
I think the fluentd setup also loses logs in case fluentd is down, right? Because Docker will only buffer a limited number of log lines.
We currently recommend two possiblities:
Use the default JSON log driver, and use Filebeat to decode the JSON objects and ship the files from the host. I'm not sure why you are saying this can lose logs in case of Logstash restarts? It shouldn't IMO.
Use local syslog to write to files on disk, then Filebeat to ship them to Logstash. It should also not lose log lines.
In addition, we're thinking of using an approach similar to LogSpout in Filebeat, but this depends on some Filebeat refactoring.
From what we tested, if fluentd is down, it will not lose logs... of course there should be some memory limit in the docker daemon to keep the log buffer, but in our tests is enough for allow us to do upgrades or for the docker orchestrator to detect the missing docker and restart it again
For json log, officially the json log file is not to be used by external tools: https://github.com/docker/docker/issues/29680
If you add that file anyway, you need to wildcard the docker container name and at very least, If you create a new container, you need to restart filebeat so it can detect the new container directory and start to read the logs... not very friendly, specially for automatic docker deployment
For syslog, again, if local syslog crash, you lose events while it is down
Maybe with the new plugin support in docker, we can finally add more log options, like proper json log files, logstash, kafka and beats.
I would say that json logs on disk/sock/pipe/whatever with proper (file/sock/pipe/whatever)beat auto-config/ would be perfect and would give everything we need
Filebeat 5.3 has prospector configuration (for now, as a Beta feature): https://www.elastic.co/guide/en/beats/filebeat/5.3/filebeat-configuration-reloading.html
I didn't realize the flutend logging driver in Docker would behave better than the Syslog one, that's interesting.