which packetbeat version are you testing with? Version 5.x adds flows support (can be disabled) which might slow down processing a bit (to be optimized before GA), but HTTP parser has been enhanced a little. Getting some packetbeat profiling would be nice. Profiling requires some golang environment being setup, but can be done remotely.
How much CPU does packetbeat take when processing ~2000req/sec? Seeing packet loss with packetbeat not even using 100% CPU hints at output queues start blocking or sniffer being implementation being inefficient (too much waiting due to poll syscall).
Pre 5.0 alpha releases the output queues have been able to block processing new packets. This has been changed to: Drop transaction if output queue is full (https://github.com/elastic/beats/blob/master/packetbeat/publish/publish.go#L53), but continue processing packets. This should help with packet-loss itself and packetbeat dropping internal stream state. Testing packetbeat with file output or console output to /dev/null (output prints to stdout, logging system prints to stderr) would be interesting.
The sniffer is currently implemented by: https://github.com/tsg/gopacket
The package contains a sample tool to sniff packets and prints some stats every N packets in subdirectory examples/pcapdump.