We're planning on monitor REDIS protocol communication with packetbeat. What we want from packetbeat is to decode and record REDIS commands between server and clients.
The test environment is somewhat like this:
Redis Server: VM with 4vCPU and 8g RAM
Redis Client for benchmark: my notebook ...
Redis version: 4.0.6, with default config
The packetbeat is running on the redis server, sniffing the interface redis server listens on. The output is set to local file.
We've tested with redis-benchmark and redis-cli.
With redis-benchmark
We only test redis command LPUSH, push 100K strings onto redis server. All pushes are successful. When redis-benchmark finished, length of key "mylist" is exactly 100K. As shown in the result of redis-benchmark, the command execution throughput is around 50K to 60K per second.
Sadly, in the output file of packetbeat, we only found around 40K to 50K of LPUSH commands, which means around 50% to 60% packet loss. The log of packetbeat itself show "redis.unmatched_response" and "tcp_dropped_because_of_gaps", however these numbers still didn't add up.
We've further tested with redis-benchmark while setting "keepalive off", the packet loss was even greater.
With redis-cli
After the tests with redis-benchmark, we've tried redis-cli for the testing as it can set the interval between each command, so that we might manually control the speed of REDIS commands.
We've tested with:
interval set from 0.1s to 0.01s, 5K commands (LPUSH) each test run , packetbeat show no loss. However, the throughput of REDIS commands is "quite" poor.
interval set to 0.0s, 10K commands (LPUSH), packetbeast show about 2% loss. The speed is around 1K/s.
As of now, I don't have the first-hand result of those tests, but if anyone is interested, later I may run these tests again and update with some exact numbers.
I'm not too surprised that packetbeat can't keep up with redis-benchmark, but the drop rate seems larger that what I'd expect. I suspect redis-bechmark is making heavily use of pipelining, which might make Packetbeat mission to reconstruct the streams harder.
What was the CPU usage of Packetbeat during the test? Also, did the network interface report any drops in the RX queue?
About redis-benchmark using pipelining, from my understanding from the man page of redis-benchmark, by using "keep alive" means redis-benchmark will keep 1 connection always alive and send all the commands through this connection/pipeline. In my previous tests, I've tested with keepalive off, which I suppose means no keepalive connection and every command is sent separately. However, the packetloss is even greater.
The CPU usage was no quite significant during my previous tests. Average system load was around 4 to 6 on a VM with 4 vCPU. If I remembered right, packetbeat spawned around 8 or 10 threads, total CPU usage of those threas was around 300% to 400%. And I'm quite sure there's no constantly busy thread ( 100% cpu usage all the time ).
RX drops wasn't taken into consideration at that time. I'll try to record more stat in my future tests.
I forgot to ask, did you use the afpacket sniffer type? Often most CPU is consumed by the sniffing itself, even before the Packetbeat code runs. Tuning snaplen and buffer_size_mb might help a bit.
My concern right now, is to identify under what kind of conditions will packetbeat start to "lose" some packets/redis-commands, or under what kind of conditions packetbeat can guarantee no packetloss. The overall network traffic/redis communication I need to decode is quite huge, but that can be splitted and load-balanced.
I'll check these configurations and try to schedule another round of tests tomorrow morning. However, my time zone is +0800, so that's some hours later .
Also, about traffice capturing, is it still possible to use pf_ring for packetbeat? I remember packetbeat dropped this feature for some reason in some quite-early verion ( maybe 2.X ? ).
My task is to monitor and audit the transactions happened on dozens of redis servers, and they can be really busy during some critical periods. I've already considering some 10/40Gb network traffic split / load-balancing hardware. I hope I can try as many possiblities as I can.
We removed the pfring option because it didn't see usage and it broke when some Cgo changes happend around Go 1.4.
If your goal is auditing, so no packet can be lost, it might be better to save the raw traffic via a dedicated sniffer (either commercial offering, or pfring, or something else) and then use packetbeat in offline mode (the -I flag) to index the traffic. This will require a lot of disk space, but gives you a "trail log" and you can scale the capturing and indexing separately.
If you still choose to use Packetbeat in online mode (this can still be advantageous), I have a couple of recommendation. Some of these might apply also if you use another tool to capture the traffic.
Use NICs based with newer Intel chipsets
load balance the traffic to multiple NICs, and if needed, multiple servers. The load balancing needs to send the request and response over to the same NIC.
I'd aim to have max 50k PPS (packets per second) per NIC. That should be well within the sniffing power of af_packet.
start a packetbeat process for each network device
Use af_packet and use a large buffer size
HTH. Just to manage expectations, be prepared to spend some time figuring things out, we don't see a ton of deployments like this.
Thank you for your advice.
Some of your recommendations have been taken into account, like faster NICs, load-balance to several servers. pf_ring or afpacket is just something I'm thinking of when choosing between "fewer but stronger servers with pf_ring" or "more weaker (comparatively) servers with afpacket". But still I need a rough estimation of the process ability of packetbeat under certain hardware configuration with no packet loss. - -||
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.