Packetbeat, af_packet and Linux 2.6

Hi, everyone.

I'm having some trouble trying to setup Packetbeat on a Linux VM with RHEL 6.7 (Linux 2.6.32).

If I use the following settings:

packetbeat.interfaces.device: any
packetbeat.interfaces.type: af_packet

then Packetbeat crashes after not being able to allocate enough memory:

2017/07/26 11:47:11.648739 beat.go:339: CRIT Exiting: Initializing sniffer failed: Error creating sniffer: setsockopt packet_rx_ring: cannot allocate memory
Exiting: Initializing sniffer failed: Error creating sniffer: setsockopt packet_rx_ring: cannot allocate memory

As far as I can tell from what strace shows me, in this case it tries to allocate 3 contiguous blocks, 8MB each:

[pid 22966] 14:47:28 setsockopt(4, SOL_PACKET, PACKET_RX_RING, {block_size=8388608, block_nr=3, frame_size=65536, frame_nr=384}, 16) = -1 ENOMEM (Cannot allocate memory)

The system where I'm trying to run Packetbeat definitely has enough free memory and the fragmentation level is not that high (there are many free 4MB chunks according to /proc/buddyinfo).

I've tried stopping the service I'm trying to monitor, dropping caches (echo 3 >/proc/sys/vm/drop_caches), compacting memory (echo 1 > /proc/sys/vm/compact_memory) and trying to run Packetbeat. This led to the same error. Tuning packetbeat.interfaces.buffer_size_mb didn't help either.

Now, if I set packetbeat.interfaces.snaplen to 1514 (network interface MTU is 1500) Packetbeat launches and starts capturing and processing traffic, but I'm seeing a lot of dropped TCP connections:

2017/07/26 11:46:23.885889 metrics.go:39: INFO Non-zero metrics in the last 5s: http.unmatched_responses=6 libbeat.publisher.published_events=16 tcp.dropped_because_of_gaps=70
2017/07/26 11:46:28.885942 metrics.go:39: INFO Non-zero metrics in the last 5s: http.unmatched_responses=4 libbeat.publisher.published_events=14 tcp.dropped_because_of_gaps=194
2017/07/26 11:46:33.886157 metrics.go:39: INFO Non-zero metrics in the last 5s: http.unmatched_responses=8 libbeat.publisher.published_events=15 tcp.dropped_because_of_gaps=122
2017/07/26 11:46:38.886054 metrics.go:39: INFO Non-zero metrics in the last 5s: http.unmatched_responses=6 libbeat.publisher.published_events=14 tcp.dropped_because_of_gaps=134
2017/07/26 11:46:43.885981 metrics.go:39: INFO Non-zero metrics in the last 5s: http.unmatched_responses=9 libbeat.publisher.published_events=25 tcp.dropped_because_of_gaps=117

This happens even with low CPU usage and large buffer_size_mb values.
I assume this has something to do with TCP segmentation offload.

Setting the snaplen parameter to 32767 helps, but it still drops connections because of gaps from time to time. Setting it any higher leads to Packetbeat refusing to start.

I have another machine with CentOS 7 (Linux 3.10.0) and Packetbeat runs fine there with the default capture length.

Did anybody have any success running Packetbeat with af_packet and default capture length on 2.4/2.6 kernels?

For af_packet to work, the kernel might require some continuous physical memory to allocate/reserve for sniffing. In case the kernel can not find this much space (due to fragmentation or overal system memory usage), instantiation might fail.

Try to set the snaplen to the expected networks MTU size. without jumbo frames that is:

packetbeat.interfaces.snaplen: 1514

You can also change the buffer size via packetbeat.interfaces.buffer_size_mb: X. The default is 30 MB.

See: https://www.elastic.co/guide/en/beats/packetbeat/current/configuration-interfaces.html#_sniffing_options

For af_packet to work, the kernel might require some continuous physical memory to allocate/reserve for sniffing. In case the kernel can not find this much space (due to fragmentation or overal system memory usage), instantiation might fail.

Yes, but, as I've said in the original post, the machine has enough free memory, I've even tried stopping the service I'm trying to monitor, dropping caches, manually invoking memory compaction and launching Packetbeat before any other 'heavy' services - it still fails to allocate memory.

Try to set the snaplen to the expected networks MTU size.

I've tried that and it led to Packetbeat losing massive amounts of TCP connections:

Now, if I set packetbeat.interfaces.snaplen to 1514 (network interface MTU is 1500) Packetbeat launches and starts capturing and processing traffic, but I'm seeing a lot of dropped TCP connections:
...
This happens even with low CPU usage and large buffer_size_mb values.
I assume this has something to do with TCP segmentation offload.
...
Setting the snaplen parameter to 32767 helps, but it still drops connections because of gaps from time to time. Setting it any higher leads to Packetbeat refusing to start.

I've also tried tuning buffer_size_mb, it doesn't help:

Tuning packetbeat.interfaces.buffer_size_mb didn't help either.

So the real question is: is there anything in Linux 2.6 preventing Packetbeat, or rather af_packet, from working properly (i.e. allocating large contiguous blocks of memory)?

Uhm... kernel 2.6 is somewhat old. No idea if compaction really releases the amount of continuous physical memory (continuous physical pages must be available) you expect.

Using NIC offloading might introduce some other problems here. packets can even become bigger then 65k or are sometimes padded, not having the original size as on wire.

Also gaps might not be due to weird packet sizes and padding, but also due to packet loss (packetbeat or the sniffer not keeping up with the amount of traffic).

kernel 2.6 is somewhat old.

Yeah, unfortunately I'm stuck with it on these machines. :disappointed:

Using NIC offloading might introduce some other problems here. packets can even become bigger then 65k or are sometimes padded, not having the original size as on wire.

So I guess there is no safe way to use Packetbeat on a system with offloading, then? I might try disabling it.

Also gaps might not be due to weird packet sizes and padding, but also due to packet loss (packetbeat or the sniffer not keeping up with the amount of traffic).

True, but I think offloading is the main culprit here, since the number of connections dropped because of gaps goes way down with higher snaplen values.

I'll see if disabling offloading helps (assuming the CPU usage doesn't go through the roof without it).

I've disabled NIC offloading and Packetbeat works fine now, since there is no need to use large capture length values. No drops so far, everything is working smoothly and the CPU usage is normal.

Cool. This indicates the main problem has been the NIC offloading.

Offloading exists, to free the machine/CPU/memory from doing extra work to operate on TCP packets and is generally a good thing to have. It's one of the reasons we set the default snaplen to 65k (besides offloading can generate bigger packets).

NIC offloading can be quite painful to monitor and messes with flows support in packetbeat. But the protocols themselves are basically deep-packet inspection, just caring about the actual content. They should still work fine with offloading enabled. I still wonder how come offloading did produce these problems for you. If it's possible, I'd love to get my hands on a raw packet capture to figure if this can be improved in beats.

If possible, it's some good practice not to run the packet monitoring tool on the host itself, but feed the actual packets (as seen on wire) via switch port forwarding or tap into the monitoring tool. This gets you a less skewed few on what's actually happening in the network (especially when offloading is enabled) + decouples production machines from the monitoring solution.

But the protocols themselves are basically deep-packet inspection, just caring about the actual content. They should still work fine with offloading enabled. I still wonder how come offloading did produce these problems for you. If it’s possible, I’d love to get my hands on a raw packet capture to figure if this can be improved in beats.

Well, the problem was not with offloading itself, it's just I wasn't able to make large snaplen values work and Packetbeat was losing some of the packets that were merged into large segments. It would probably work fine even with offloading if not for the strange inability to allocate large blocks of memory. That's probably why there aren't many topics about this kind of issue: the majority of Packetbeat users are probably running it on machines with modern kernels.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.