Unable to Publish all Events from Packetbeat to Elasticsearch for bursts

I've been characterizing a problem with Packetbeat, configured to directly output to Elasticsearch. I've got the YML configured to use the memory internal queue (not the file spool queue).

The problem is, when I burst 250 events, all attempting to be published within 500 milliseconds or so, there's times when not all the 250 events show up in Elasticsearch. I've turned on enough debugging to see this in the packetbeat log (set to debug level):

2019-11-24T15:56:44.069Z DEBUG [publisher] memqueue/produce.go:155 Dropping event, queue is blocked (seq=138)

I see this log the same number of times as events I'm missing in Elasticsearch. For some bursts there's no loss, some have significant loss, it varies so it seems to be some sort of race condition or performance issue.

What I'd like to know most is why would this memory queue be blocked? Here's the important sections of the packetbeat.yml file I'm using related to this:

queue:
mem:
events: 65536
#flush.min_events: 4096
#flush.timeout: 5s

and ...

output.elasticsearch:
bulk_max_size: 0

I've tried various combinations of settings for queue:mem:flush.min_events and timeouts, nothing seems to help.

Explanation on why this blocking is occurring would be most welcome!

Interesting. 250 evs in 500ms shouldn't be a problem at all. How many events per second are you ingesting in total? Looks like your queue is permanently at capacity and small bursts have to be discarded.

Which version of Packetbeat are you using? Can you share your full debug logs?

Thanks Adrian for the response.

I've been able to reproduce the problem with just these 250 events - no other events being published by this packetbeat at the time of this blocking.

I put a little extra code into libbeat\publisher\queue\memqueue\produce.go, which creates this in the log:

I did a little digging into Go channels, and blocking within select statements, and it seems that if we don't handle the default case (where this drop seems to occur) that the select statement will keep waiting for either other condition to occur.

So ... i hacked a little and took out the detault case and it indeed solved the drop problem. I've tested at least 20-30 runs of this 250 event burst and without this default section in this module, packetbeat consistently is able to publish all 250 events into elasticsearch.

If you'd like me to send the log I can, but at this log level its large - around 50 MB per each burst. Send me an address and I can upload.

Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.