Collecting logs from docker hosts

Hey

I need some consultation here whether my current setup is the right way to go. I want to make sure that logs are not lost in case some parts of the system are temporarily unavailable. Thanks in advance for help.

  1. Containers use default logging driver (json-file) with max-size and max-file set. All applications log to stdout in JSON format.

  2. On each host, I started filebeat container with the following config:

    filebeat.autodiscover:
      providers:
        - type: docker
          templates:
            - condition:
                equals:
                  docker.container.image: <my-image-name>  # some comments below
              config:
                - type: docker
                  json.message_key: event
                  json.keys_under_root: true
                  containers.ids:
                    - "${data.docker.container.id}"

    output.logstash:
      hosts: ["logstash:5044"]
  1. One logstash instance writing to elastic search cluster:
    input {
      beats {
        port => 5044
      }
    }

    filter {
        mutate {
          # docker field contains too many useless info so I keep only two things I really need
          add_field => {
            "rancher_docker_image" => "%{[docker][container][image]}"
            "rancher_stack_service" => "%{[docker][container][labels][io.rancher.stack_service.name]}"
          }
          remove_field => [ "docker" ]
        }
    }

    output {
      elasticsearch { hosts => ["elasticsearch:9200"] }
    }

If I understood it well, it should work like this:

  • if application starts logging before filebeat is available, filebeat will tail the log and catch up with everything it missed
  • if logstash is not available, filebeat will wait
  • "buffer" size is defined by max-size and max-file logging driver options
  • applications log asynchronously (that is they are not blocked by logging infrastructure)
  • in case of performance problems, logstash can be scaled by adding additional instances (and LB between them)

Questions:

  1. Any suggestions to the description above? Is there a better way to handle that? I've already considered gelf driver (UDP packets - logs can be lost), syslog (can be blocking for apps if not available + more parsing), logspout (it was loosing logs from the time it was down). This one seems to be the best for now.
  2. I wrote condition based on image but finally I'd like to use labels. However, I was not able to make it work with filebeat-6.2.2. Does it mean https://github.com/elastic/beats/pull/6412 is not yet included?

Hi @jbacic,

Your understanding of how all this works looks good to me :slight_smile:. Docker will write to files, and that's your buffer as of today. Filebeat is able to pick logs from there and keep a registry of reading offsets, so it can recover from restarts.

You may want to have a look to Logstash persisten quques to buffer on disk in the logstash node: Persistent queues (PQ) | Logstash Reference [8.11] | Elastic.

  1. The overall architecture looks good to me, would love to hear about your experience if you go live with it :slight_smile:

  2. The fix didn't go out yet, you will have to wait for 6.2.3 or build your own image. Also take into account that dotted labels are an issue, we are working on a new fix for those.

Best regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.