Filebeat slow to send logs to logstash

Hello all,

Logstash & Filebeat version: 5.5.1
I am new to the ELK stack. I have set up a elasticsearch cluster on k8s. In the begining filebeat was very slow to send logs to logstash ( took hours ) and the logs of filebeat were:

2019/12/13 10:15:07.798513 sync.go:85: ERR Failed to publish events caused by: read tcp 10.1.94.36:41834->10.0.59.193:5044: i/o timeout
2019/12/13 10:15:07.798527 single.go:91: INFO Error publishing events (retrying): read tcp 10.1.94.36:41834->10.0.59.193:5044: i/o timeout

After some research, I applied the following modifications to filebeat and logstash configuration:
logstash config:

congestion_threshold => 180
 client_inactivity_timeout => 1200

Filebeat config :

scan_frequency: 30s
bulk_max_size: 2048

After these changes, more events were able to be sent as the window size grew up to 400 sometimes ( it nevers goes beyond 400 even if the limit is 2048). Before these changes, the window size never grew more than 10.
Also, I noted that filebeat is constantly sending new events ( even if not many logs are generated).is this a normal behavior?

Below are the current configs and logs of filebeat and logstash.

**Filebeat config:**
    apiVersion: v1
data:
  filebeat.yml: |-
    filebeat.prospectors:
    - input_type: log
      paths:
        - /var/log/containers/*.log
      scan_frequency: 30s
      symlinks: true
      json.message_key: log
      json.keys_under_root: true
      json.add_error_key: true
      fields_under_root: true
      fields:
        type: kube-logs
        node.hostname: ${NODE_HOSTNAME}
        pod.ip: ${POD_IP}
      tags:
          - k8s-app
    filebeat.config.modules:
      # Set to true to enable config reloading
      reload.enabled: true
    output.logstash:
      hosts: logstash:5044
      timeout: 15
      bulk_max_size: 2048
      ssl.certificate_authorities: ["/usr/share/elasticsearch/config/tls/ca.crt"]
      ssl.certificate: "/usr/share/elasticsearch/config/tls/filebeat.crt"
      ssl.key: "/usr/share/elasticsearch/config/tls/filebeat.key"
      ssl.key_passphrase: ${APP_KEYSTORE_PASSWORD}
    logging.level: debug
kind: ConfigMap
metadata:
  creationTimestamp: 2019-12-12T13:06:29Z
  labels:
    app: logging-elk-filebeat-ds
    chart: ibm-icplogging-2.2.0
    component: filebeat-ds
    heritage: Tiller
    release: logging
  name: logging-elk-filebeat-ds-config
  namespace: kube-system
  resourceVersion: "91594477"
  selfLink: /api/v1/namespaces/kube-system/configmaps/logging-elk-filebeat-ds-config
  uid: 35ed571f-1ce0-11ea-b481-506b8dfb518b

Filebeat logs:

2019/12/13 10:26:17.025879 output.go:109: DBG  output worker: publish 2048 events
2019/12/13 10:26:17.025891 sync.go:107: DBG  Try to publish 2048 events to logstash with window size 120
2019/12/13 10:26:19.057002 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=1871 libbeat.logstash.publish.write_bytes=27259 libbeat.logstash.published_and_acked_events=247 libbeat.publisher.published_events=2048 publish.events=2048 registrar.states.current=1 registrar.states.update=2048 registrar.writes=1
2019/12/13 10:26:23.399544 prospector.go:183: DBG  Run prospector
2019/12/13 10:26:23.399579 prospector_log.go:70: DBG  Start next scan
2019/12/13 10:26:23.400226 prospector_log.go:226: DBG  Check file for harvesting: /var/log/containers/nvidia-device-plugin-bc5x8_kube-system_nvidia-device-plugin-94ad7e80e7f62375df450bfb950f1bc2845dd63851af8e87dee32fc68229b3f6.log
2019/12/13 10:26:23.400256 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/nvidia-device-plugin-bc5x8_kube-system_nvidia-device-plugin-94ad7e80e7f62375df450bfb950f1bc2845dd63851af8e87dee32fc68229b3f6.log, offset: 140
2019/12/13 10:26:23.400263 prospector_log.go:313: DBG  File didn't change: /var/log/containers/nvidia-device-plugin-bc5x8_kube-system_nvidia-device-plugin-94ad7e80e7f62375df450bfb950f1bc2845dd63851af8e87dee32fc68229b3f6.log
2019/12/13 10:26:23.400271 prospector_log.go:226: DBG  Check file for harvesting: /var/log/containers/monitoring-prometheus-nodeexporter-t5bsq_kube-system_nodeexporter-f14734f3c57c2b266eeb058b4e32dbc78fcfaa63ad04e9deb20634aa60e66e84.log
2019/12/13 10:26:23.400279 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/monitoring-prometheus-nodeexporter-t5bsq_kube-system_nodeexporter-f14734f3c57c2b266eeb058b4e32dbc78fcfaa63ad04e9deb20634aa60e66e84.log, offset: 5605
2019/12/13 10:26:23.400284 prospector_log.go:313: DBG  File didn't change: /var/log/containers/monitoring-prometheus-nodeexporter-t5bsq_kube-system_nodeexporter-f14734f3c57c2b266eeb058b4e32dbc78fcfaa63ad04e9deb20634aa60e66e84.log
2019/12/13 10:26:23.400291 prospector_log.go:226: DBG  Check file for harvesting: /var/log/containers/calico-node-tgxkl_kube-system_install-cni-e65fb542a64c3fe9b3d1de82fbe5e5b79c1cc0cb23e1911156a28855a0f8952f.log
2019/12/13 10:26:23.400297 prospector_log.go:259: DBG  Update existing file for harvesting: /var/log/containers/calico-node-tgxkl_kube-system_install-cni-e65fb542a64c3fe9b3d1de82fbe5e5b79c1cc0cb23e1911156a28855a0f8952f.log, offset: 3708
2019/12/13 10:26:23.400302 prospector_log.go:313: DBG  File didn't change: /var/log/containers/calico-node-tgxkl_kube-system_install-cni-e65fb542a64c3fe9b3d1de82fbe5e5b79c1cc0cb23e1911156a28855a0f8952f.log
2019/12/13 10:26:23.400308 prospector_log.go:226: DBG  Check file for harvesting: /var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log
2019/12/13 10:26:23.400320 prospector_log.go:245: DBG  Start harvester for new file: /var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log
2019/12/13 10:26:26.650513 sync.go:78: DBG  120 events out of 2048 events sent to logstash. Continue sending
2019/12/13 10:26:26.650545 sync.go:107: DBG  Try to publish 1928 events to logstash with window size 120
2019/12/13 10:26:31.125702 sync.go:78: DBG  120 events out of 1928 events sent to logstash. Continue sending
2019/12/13 10:26:31.125744 sync.go:107: DBG  Try to publish 1808 events to logstash with window size 120
2019/12/13 10:26:36.691001 sync.go:78: DBG  120 events out of 1808 events sent to logstash. Continue sending
2019/12/13 10:26:36.691041 sync.go:107: DBG  Try to publish 1688 events to logstash with window size 120
2019/12/13 10:26:48.675369 sync.go:78: DBG  120 events out of 1688 events sent to logstash. Continue sending
2019/12/13 10:26:48.675407 sync.go:107: DBG  Try to publish 1568 events to logstash with window size 120
2019/12/13 10:26:49.056948 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=140 libbeat.logstash.publish.write_bytes=33934
2019/12/13 10:26:55.722936 sync.go:78: DBG  120 events out of 1568 events sent to logstash. Continue sending
2019/12/13 10:26:55.722971 sync.go:107: DBG  Try to publish 1448 events to logstash with window size 120
2019/12/13 10:27:00.115661 sync.go:78: DBG  120 events out of 1448 events sent to logstash. Continue sending
2019/12/13 10:27:00.115704 sync.go:107: DBG  Try to publish 1328 events to logstash with window size 120
2019/12/13 10:27:13.896299 sync.go:78: DBG  120 events out of 1328 events sent to logstash. Continue sending
2019/12/13 10:27:13.896345 sync.go:107: DBG  Try to publish 1208 events to logstash with window size 120
2019/12/13 10:27:19.056918 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=105 libbeat.logstash.publish.write_bytes=24952
2019/12/13 10:27:24.090525 sync.go:78: DBG  120 events out of 1208 events sent to logstash. Continue sending
2019/12/13 10:27:24.090558 sync.go:107: DBG  Try to publish 1088 events to logstash with window size 120
2019/12/13 10:27:30.011920 sync.go:78: DBG  120 events out of 1088 events sent to logstash. Continue sending
2019/12/13 10:27:30.011959 sync.go:107: DBG  Try to publish 968 events to logstash with window size 120
2019/12/13 10:27:39.834750 sync.go:78: DBG  120 events out of 968 events sent to logstash. Continue sending
2019/12/13 10:27:39.834781 sync.go:107: DBG  Try to publish 848 events to logstash with window size 120
2019/12/13 10:27:49.060311 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=105 libbeat.logstash.publish.write_bytes=24718
^C2019/12/13 10:27:54.838827 client.go:194: DBG  handle error: read tcp 10.1.94.36:42626->10.0.59.193:5044: i/o timeout
2019/12/13 10:27:54.838872 sync.go:78: DBG  0 events out of 848 events sent to logstash. Continue sending
2019/12/13 10:27:54.838882 sync.go:58: DBG  close connection
2019/12/13 10:27:54.838889 client.go:110: DBG  closing
2019/12/13 10:27:54.839175 sync.go:85: ERR Failed to publish events caused by: read tcp 10.1.94.36:42626->10.0.59.193:5044: i/o timeout
2019/12/13 10:27:54.839203 single.go:91: INFO Error publishing events (retrying): read tcp 10.1.94.36:42626->10.0.59.193:5044: i/o timeout
2019/12/13 10:27:54.839211 sync.go:58: DBG  close connection
2019/12/13 10:27:54.839230 single.go:156: DBG  send fail
2019/12/13 10:27:54.839237 single.go:160: DBG  reset fails
2019/12/13 10:27:55.839410 sync.go:53: DBG  connect
2019/12/13 10:27:55.859760 sync.go:107: DBG  Try to publish 848 events to logstash with window size 60
2019/12/13 10:27:56.677498 window.go:51: DBG  update current window size: 60
2019/12/13 10:27:56.677542 sync.go:78: DBG  60 events out of 848 events sent to logstash. Continue sending
2019/12/13 10:27:56.677556 sync.go:107: DBG  Try to publish 788 events to logstash with window size 90

Logstash config:

apiVersion: v1
data:
  k8s.conf: |-
    input {
      beats {
        port => 5044
        ssl => true
        congestion_threshold => 180
        client_inactivity_timeout => 1200
        ssl_certificate_authorities => ["/usr/share/elasticsearch/config/tls/ca.crt"]
        ssl_certificate => "/usr/share/elasticsearch/config/tls/logstash.crt"
        ssl_key => "/usr/share/elasticsearch/config/tls/logstash.key"
        ssl_key_passphrase => "${APP_KEYSTORE_PASSWORD}"
        ssl_verify_mode => "force_peer"
      }
    }

    filter {
      if [type] == "kube-logs" {
        mutate {
          rename => { "message" => "log" }
          remove_field => ["host"]
        }

        date {
          match => ["time", "ISO8601"]
        }

        # Sample:
        # tiller-1279707952-sgd08_kube-system_tiller-0c51282d195d1c21307f6a48b9118d690441cda498fc5a2790c043407eab235b.log
        # filebeat-j357d_default_filebeat-1a3113e374ad7937a3efa36c4bb42b46b976bcd7cd96223e6b9e6e3df08d802a.log
        # appstdoutpod_default_app-stdout-container-01c2e7a7b105d9141825ea3ae5634b580fdd20a5a4ee890cdbe0816ca002623c.log
        # unified-router-4047118581-sm913_kube-system_unified-router-ddda8a8cbcb74c45b64a4b18997b4f2928c998a37e45037cd0304eaa56d1634f.log
        dissect {
          mapping => {
            "source" => "/var/log/containers/%{kubernetes.pod}_%{kubernetes.namespace}_%{container_file_ext}"
          }
        }

        dissect {
          mapping => {
            "container_file_ext" => "%{container}.%{?file_ext}"
          }
          remove_field => ["host", "container_file_ext"]
        }

        grok {
          "match" => {
            "container" => "^%{DATA:kubernetes.container_name}-(?<kubernetes.container_id>[0-9A-Za-z]{64,64})"
          }
          remove_field => ["container"]
        }
      }
    }

    filter {
      # Drop empty lines
      if [log] =~ /^\s*$/ {
        drop { }
      }
      # Attempt to parse JSON, but ignore failures and pass entries on as-is
      json {
        source => "log"
        skip_on_invalid_json => true
      }
    }

    output {
      elasticsearch {
        hosts => "elasticsearch:9200"
        index => "logstash-%{+YYYY.MM.dd}"
        document_type => "%{[@metadata][type]}"
        ssl => true
        ssl_certificate_verification => true
        keystore => "/usr/share/elasticsearch/config/tls/logstash-elasticsearch-keystore.jks"
        keystore_password => "${APP_KEYSTORE_PASSWORD}"
        truststore => "/usr/share/elasticsearch/config/tls/truststore.jks"
        truststore_password => "${CA_TRUSTSTORE_PASSWORD}"
      }
    }
  logstash.yml: |-
    config.reload.automatic: true
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
    xpack.monitoring.enabled: false
    log.level: debug
kind: ConfigMap
metadata:
  creationTimestamp: 2019-12-12T13:06:29Z
  labels:
    app: logging-elk-elasticsearch
    chart: ibm-icplogging-2.2.0
    component: logstash
    heritage: Tiller
    release: logging
  name: logging-elk-logstash-config
  namespace: kube-system
  resourceVersion: "91621635"
  selfLink: /api/v1/namespaces/kube-system/configmaps/logging-elk-logstash-config
  uid: 35f2891b-1ce0-11ea-b481-506b8dfb518b

Logstash logs

[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"pod"=>{"ip"=>"10.1.94.36"}, "offset"=>620854, "log"=>"  \"stream\": \"stderr\",", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-12T23:31:59.767Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-12T23:31:59.767359186Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"offset"=>280076, "pod"=>{"ip"=>"10.1.94.36"}, "log"=>"  \"stream\": \"stderr\",", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-13T03:35:33.285Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-13T03:35:33.285527164Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"pod"=>{"ip"=>"10.1.94.36"}, "offset"=>19024, "log"=>"2019-12-13 10:03:39.178 [INFO][76] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}", "kubernetes.pod"=>"calico-node-tgxkl", "kubernetes.container_name"=>"calico-node", "input_type"=>"log", "source"=>"/var/log/containers/calico-node-tgxkl_kube-system_calico-node-2a0252daf4ec37033cede126867eb7b054c33f555683c17a620be3048b0514a4.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"2a0252daf4ec37033cede126867eb7b054c33f555683c17a620be3048b0514a4", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-13T10:03:39.178Z, "stream"=>"stdout", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-13T10:03:39.178887864Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"offset"=>607350, "pod"=>{"ip"=>"10.1.94.36"}, "log"=>"  },", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-12T23:36:02.963Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-12T23:36:02.963324975Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"offset"=>591443, "pod"=>{"ip"=>"10.1.94.36"}, "log"=>"  \"log\": \"  \\\"tags\\\": [\",", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-12T23:44:04.957Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-12T23:44:04.957842018Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"offset"=>1143522, "pod"=>{"ip"=>"10.1.94.36"}, "log"=>"  \"tags\": [", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-12T19:59:22.141Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-12T19:59:22.141379588Z"}}
[2019-12-13T10:30:37,535][DEBUG][logstash.pipeline        ] output received {"event"=>{"pod"=>{"ip"=>"10.1.94.36"}, "offset"=>883471, "log"=>"  \"input_type\": \"log\",", "kubernetes.pod"=>"logging-elk-filebeat-ds-9c6bd", "kubernetes.container_name"=>"filebeat", "input_type"=>"log", "source"=>"/var/log/containers/logging-elk-filebeat-ds-9c6bd_kube-system_filebeat-8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798.log", "type"=>"kube-logs", "tags"=>["k8s-app", "beats_input_raw_event"], "kubernetes.container_id"=>"8e76c31819d6bc2d43e72cc81387294a1b566b3518c6694d542db9e0ed5f2798", "node"=>{"hostname"=>"172.16.7.23"}, "@timestamp"=>2019-12-12T21:12:21.793Z, "stream"=>"stderr", "kubernetes.namespace"=>"kube-system", "beat"=>{"hostname"=>"logging-elk-filebeat-ds-9c6bd", "name"=>"logging-elk-filebeat-ds-9c6bd", "version"=>"5.5.1"}, "@version"=>"1", "time"=>"2019-12-12T21:12:21.793640482Z"}}
[2019-12-13T10:30:38,331][DEBUG][logstash.instrument.periodicpoller.cgroup] Error, cannot retrieve cgroups information {:exception=>"Errno::ENOENT", :message=>"No such file or directory - /sys/fs/cgroup/cpuacct/kubepods/burstable/pod916b36cc-1d0e-11ea-8b5c-506b8df5d4ee/7212fb00a902fb1ef991ea52125a16d98f4c117df6fca7af828589026d083625/cpuacct.usage"}

Update

I changed the path of the input file of filebeat so that I have control over it. It seems to work fine this way.. As only 1 event is queued when i add a new log to the log file. However, when the input comes from logs generated from containers, the logs events are alwaysw 2048. I know for a fact that not that many logs are generated between each scan. Any input on this? Why would logs with different format get processed faster?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.