Filebeat keep getting error : ERROR pipeline/output.go:121 Failed to publish events: temporary bulk send failure and needed restart

Hi,

We are sending cassandra logs to ES via filebeat and after each few days it stop sending entries to ES with following Error is visible in filebeat logs:

2022-02-09T16:04:08.762Z        INFO    pipeline/output.go:105  Connection to backoff(elasticsearch(https://<ES Host Name>:443)) established
2022-02-09T16:04:15.424Z        ERROR   pipeline/output.go:121  Failed to publish events: temporary bulk send failure
2022-02-09T16:04:15.424Z        INFO    pipeline/output.go:95   Connecting to backoff(elasticsearch(https://<ES Host Name>:443))
2022-02-09T16:04:15.436Z        INFO    elasticsearch/client.go:739     Attempting to connect to Elasticsearch version 6.0.0
2022-02-09T16:04:15.460Z        INFO    template/load.go:128    Template already exists and will not be overwritten.
2022-02-09T16:04:15.460Z        INFO    instance/beat.go:889    Template successfully loaded.

Restarting filebeat solves the issue and again after few days issue re-appears.

I have read through some of the posts in the forum with similar error messages but unable to link it to my issue.

filebeat version 6.8.12 (amd64), libbeat 6.8.12 [fdb5036adbe45aa10a03882b2245578ad17c3615 built 2020-08-12 06:26:46 +0000 UTC]

filebeat.yml

filebeat.prospectors:
- input_type: log
  fields:
    index: 54olqeye17
  paths:
    - "/var/log/cassandra/system.log*"
    - "/var/log/cassandra/gc.log.*.current"
  scan_frequency: 30s
  document_type: cassandra_system_logs
  exclude_files: ['\.zip$']
  multiline.pattern: '^TRACE|DEBUG|WARN|INFO|ERROR'
  multiline.negate: true
  multiline.match: after
  multiline.timeout: 5m
  backoff: 5s
  max_backoff: 10s

- input_type: log
  fields:
    index: 9q2beq2iuu
  paths:
    - "/var/log/cassandra/repair.log*"
  document_type: cassandra_logs
  exclude_files: ['\.zip$']
  multiline.pattern: '^TRACE|DEBUG|WARN|INFO|ERROR'
  multiline.negate: true
  multiline.match: after
  multiline.timeout: 5m
  backoff: 5s
  max_backoff: 10s

- input_type: log
  fields:
    index: 04wbin96l3
  paths:
    - "/var/log/cassandra/debug.log*"
  document_type: cassandra_logs
  exclude_files: ['\.zip$']
  multiline.pattern: '^TRACE|DEBUG|WARN|INFO|ERROR'
  multiline.negate: true
  multiline.match: after
  multiline.timeout: 5m
  backoff: 5s
  max_backoff: 10s

output.elasticsearch:
  hosts: ["https://<ES Host Name>:443"]
  index: '%{[fields.index]}'
  ssl.certificate: "/var/private/es-client.pem"
  ssl.key: "/var/private/es-client.key"
  backoff.init: 5

setup.template:
  name: anx
  pattern: anx

Any help would be highly appreciated. We are still hitting this issue.

We were able to catch up the error msg when the issue actually starts:

2022-02-14T08:24:05.634Z        INFO    log/harvester.go:255    Harvester started for file: /var/log/cassandra/gc.log.0.current
2022-02-14T08:24:31.337Z        INFO    [monitoring]    log/log.go:144  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1195370,"time":{"ms":15}},"total":{"ticks":11405220,"time":{"ms":129},"value":11405220},"user":{"ticks":10209850,"time":{"ms":114}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"b71df7df-8d17-47d7-bf2f-2cc19b427900","uptime":{"ms":327810063}},"memstats":{"gc_next":17646256,"memory_alloc":11554968,"memory_total":808295726160,"rss":1855488}},"filebeat":{"events":{"active":1153,"added":1454,"done":301},"harvester":{"open_files":8,"running":8,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":300,"active":50,"batches":7,"total":350},"read":{"bytes":8436},"write":{"bytes":282387}},"pipeline":{"clients":3,"events":{"active":1153,"filtered":1,"published":1453,"total":1454},"queue":{"acked":300}}},"registrar":{"states":{"current":10,"update":301},"writes":{"success":7,"total":7}},"system":{"load":{"1":5.69,"15":12.34,"5":10.35,"norm":{"1":0.1185,"15":0.2571,"5":0.2156}}}}}}
2022-02-14T08:25:01.337Z        INFO    [monitoring]    log/log.go:144  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1195420,"time":{"ms":42}},"total":{"ticks":11405400,"time":{"ms":173},"value":11405400},"user":{"ticks":10209980,"time":{"ms":131}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"b71df7df-8d17-47d7-bf2f-2cc19b427900","uptime":{"ms":327840063}},"memstats":{"gc_next":18594368,"memory_alloc":13403704,"memory_total":808312319632,"rss":2306048}},"filebeat":{"events":{"active":248,"added":2698,"done":2450},"harvester":{"open_files":8,"running":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":2450,"batches":49,"total":2450},"read":{"bytes":68894},"write":{"bytes":2154827}},"pipeline":{"clients":3,"events":{"active":1401,"published":2698,"total":2698},"queue":{"acked":2450}}},"registrar":{"states":{"current":10,"update":2450},"writes":{"success":49,"total":49}},"system":{"load":{"1":4.62,"15":12.04,"5":9.64,"norm":{"1":0.0963,"15":0.2508,"5":0.2008}}}}}}
2022-02-14T08:25:31.337Z        INFO    [monitoring]    log/log.go:144  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1195480,"time":{"ms":68}},"total":{"ticks":11405940,"time":{"ms":548},"value":11405940},"user":{"ticks":10210460,"time":{"ms":480}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":15},"info":{"ephemeral_id":"b71df7df-8d17-47d7-bf2f-2cc19b427900","uptime":{"ms":327870063}},"memstats":{"gc_next":35748416,"memory_alloc":23068328,"memory_total":808410632808,"rss":22605824}},"filebeat":{"events":{"active":1386,"added":6582,"done":5196},"harvester":{"open_files":8,"running":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":5245,"active":-50,"batches":104,"failed":1,"total":5196},"read":{"bytes":147529},"write":{"bytes":5531935}},"pipeline":{"clients":3,"events":{"active":2787,"published":6582,"total":6582},"queue":{"acked":5196}}},"registrar":{"states":{"current":10,"update":5196},"writes":{"success":104,"total":104}},"system":{"load":{"1":3.67,"15":11.73,"5":8.93,"norm":{"1":0.0765,"15":0.2444,"5":0.186}}}}}}
2022-02-14T08:25:32.961Z        ERROR   pipeline/output.go:121  Failed to publish events: temporary bulk send failure
2022-02-14T08:25:32.962Z        INFO    pipeline/output.go:95   Connecting to backoff(elasticsearch(https://<ES Host Name>.com:443))
2022-02-14T08:25:32.974Z        INFO    elasticsearch/client.go:739     Attempting to connect to Elasticsearch version 6.0.0
2022-02-14T08:25:32.999Z        INFO    template/load.go:128    Template already exists and will not be overwritten.
2022-02-14T08:25:32.999Z        INFO    instance/beat.go:889    Template successfully loaded.
2022-02-14T08:25:32.999Z        INFO    pipeline/output.go:105  Connection to backoff(elasticsearch(https://<ES Host Name>.com:443)) established

For us the bulk_max_size is net set so by default its supposed to be 50. Can this be the reason?

bulk_max_size) to the output, the moment the output is ready to server

The maximum number of events to bulk in a single Elasticsearch bulk API index request.

#bulk_max_size: 50

We were able to solve it ourselves. Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.