Missing packets of packetbeat?

hi,in our network we use packetbeat with 'af_packet' mode on readhat 6.5 (22 core CPU, 768G memory, 16 SATA disks, 10Gbs NIC) to analyze HTTP traffic which is about 4Gbps, and we find that it does not work very well, too much packets were missed. so I want to know the performance limit of packetbeat? and is there some advice from elastic for us to speedup performance of packetbeat ?

btw, I know there are some high performance solutions for packet traffic on Linux such as DPDK and so on, will packetbeat consider them? Thanks.

Actually af_packet can be quite fast, as af_packet supports multiple receive queues, allowing for some parallelism. DPDK (and others) performance gains are mostly due to bypassing most of the network layer + (somewhat more important) parallelising processing, by having multiple receive queues (be it virtual or physical). Parallelising processing is about load-balancing. But for analysing a stable hash based load-balancing is required -> load distribution on queues and analysers do depend on actual traffic patterns.

Unfortunately packetbeat currently does not support multiple af_packet receive queues, but configures only one queue. Therefore all analysis is fully single threaded. The only way to get some load-balancing is to have multiple devices (e.g. either your network adapter+driver supports the creation of virtual devices to sniff from, or your tap support some load balancing) and start one packetbeat per devices.

Another potential problem is with outputs. If your target system is not able to cope with the number of events packetbeat generates, transactions are dropped by packetbeat (giving you some kind of sampling).

Thanks for reply. so can I start mutiple packetbeat instances with different BPFs but on only one device to supporting load balancing? I have 768G memory and 22 CPU cores, I want to find a way to improve efficiency. Thanks

and, I find several parameters in packetbeat.yml:

packetbeat.interfaces.buffer_size_mb/queue.mem.events/queue.mem.min_events/out.elasticsearch.bulk_max_size/out.elasticsearch.worker

what's the relationship between these params? Is there any recommended value of these params?

thanks

All these packetbeat instances would still server the same queue. Normally load-balancing should be done with multiple receive queue, with help of the NIC. Just using BPF, the filter still must be applied on every single packet + you will have a multitude of filters to be executed, before a packet hits packetbeat. I don't think you will see much benefit by running multiple packetbeat with multiple bpf filters.

thanks for steffens, after i changed the params

queue.mem.events: 409600
queue.mem.flush.min_events: 0
queue.mem.flush.timeout: 0s
bulk_max_size: 40960
worker: 10

it seems better now, the indexing rate become from 2000/s to almost 20000/s, but I find many err logs, what does it mean?

2018/01/24 08:40:07.001454 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.001591 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.020815 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.020937 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.078670 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.078795 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.099051 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.099184 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.172256 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.172402 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.279724 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.279861 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.329936 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.330054 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.382746 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.382881 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.404931 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.405055 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.422912 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.423039 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.540878 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.541017 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.561096 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.561217 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.629527 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.629647 log.go:176: ERR Stacktrace: goroutine 250 [running]:
2018/01/24 08:40:07.640805 log.go:175: ERR ParseHttp exception. Recovering, but please report this: runtime error: slice bounds out of range.
2018/01/24 08:40:07.640931 log.go:176: ERR Stacktrace: goroutine 250 [running]:

The http parser failed on some packets. Parsing errors could be triggered by packet-loss, or when packetbeat is first started up. For parsing the application layer, the parsers in packetbeat must be in sync with the actual TCP streams. On startup or after packetloss, packetbeat is not in sync -> starts parsing somewhere in the middle of a http message. When hitting parsing errors, packetbeat will throw away the TCP connections current start and attempt to parse again, until packetbeat is eventually in-sync.

Alternatively to packetbeat not being in sync with the network traffic, this could also be a bug. Anyways, having a stack-trace indicates the parse could be made more robust on errors.

Can you please post the full stack-trace after this message occurs? Also check your logs for messages indicating packet drops (gaps).

This topic was automatically closed after 21 days. New replies are no longer allowed.