Filebeat v8.11.3 crashes with "panic: sync: negative WaitGroup counter" again and again

After upgrading to Filebeat agent v8.11.3 it started to crash quite often with the following panic message:

Jan 24 06:13:44 my-hostname filebeat[1534]: panic: sync: negative WaitGroup counter
Jan 24 06:13:44 my-hostname filebeat[1534]: goroutine 152 [running]:
Jan 24 06:13:44 my-hostname filebeat[1534]: sync.(*WaitGroup).Add(0x400155f928?, 0xaaaae2ec4e84?)
Jan 24 06:13:44 my-hostname filebeat[1534]:         sync/waitgroup.go:62 +0x10c
Jan 24 06:13:44 my-hostname filebeat[1534]: sync.(*WaitGroup).Done(...)
Jan 24 06:13:44 my-hostname filebeat[1534]:         sync/waitgroup.go:87
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/filebeat/beater.(*eventCounter).Done(0x400155f928?)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/filebeat/beater/channels.go:103 +0x9c
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/filebeat/beater.(*finishedLogger).Published(0x4001aa02d8, 0x22)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/filebeat/beater/channels.go:89 +0x38
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/filebeat/beater.eventACKer.func1(0x400155fbc8?, {0x4005e14640, 0x22, 0xaaaae0cf0328?})
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/filebeat/beater/acker.go:65 +0x360
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/common/acker.(*eventDataACKer).onACK(0x4001aad600, 0xaaaae0cdfd1c?, 0x22)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/common/acker/acker.go:257 +0x194
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/common/acker.(*trackingACKer).ACKEvents(0x4000e508a0, 0x22)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/common/acker/acker.go:206 +0x330
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/common/acker.ackerList.ACKEvents(...)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/common/acker/acker.go:294
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.(*bufferingEventLoop).processACK.func1()
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/eventloop.go:515 +0x2c
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.(*bufferingEventLoop).processACK(0x400055fa70, {0x0, 0x0}, 0x22)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/eventloop.go:525 +0x14c
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.(*ackLoop).handleBatchSig(0x4000561040)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/ackloop.go:73 +0x94
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.(*ackLoop).run(0x4000561040)
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/ackloop.go:52 +0x14c
Jan 24 06:13:44 my-hostname filebeat[1534]: github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.NewQueue.func2()
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/broker.go:201 +0x58
Jan 24 06:13:44 my-hostname filebeat[1534]: created by github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue.NewQueue
Jan 24 06:13:44 my-hostname filebeat[1534]:         github.com/elastic/beats/v7/libbeat/publisher/queue/memqueue/broker.go:199 +0x4fc
Jan 24 06:13:44 my-hostname systemd[1]: filebeat.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 24 06:13:44 my-hostname systemd[1]: filebeat.service: Failed with result 'exit-code'.
Jan 24 06:13:44 my-hostname systemd[1]: filebeat.service: Consumed 1min 46.616s CPU time.
Jan 24 06:13:45 my-hostname systemd[1]: filebeat.service: Scheduled restart job, restart counter is at 1.
Jan 24 06:13:45 my-hostname systemd[1]: Stopped Filebeat sends log files to Logstash or directly to Elasticsearch..
Jan 24 06:13:45 my-hostname systemd[1]: filebeat.service: Consumed 1min 46.616s CPU time.
Jan 24 06:13:45 my-hostname systemd[1]: Started Filebeat sends log files to Logstash or directly to Elasticsearch..

I don't see any other errors related to this panic message.

Partially it is fixed by increasing queue and output sizes from the default to these or higher:

output.elasticsearch.bulk_max_size: 2400
output.elasticsearch.worker: 1
queue.mem.events: 4800
queue.mem.flush.min_events: 2400

But I still see that the Filebeat agent crashes sometimes with this panic message.

We are seeing this issue in 8.12.0 and also 8.11.4. We were originally on 8.9.0.

(adding this as a reference) Race in memqueue leads to panic: sync: negative WaitGroup counter · Issue #37702 · elastic/beats · GitHub

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.