Packetbeat - no way to discover failed TCP connection attempts


(Florin Andrei) #1

This is a basic task that could easily be done with tcpdump and I was hoping to do it with packetbeat as well.

Let's say a process on the current instance is trying to connect to an external endpoint on port 443. This being HTTPS, packetbeat.protocols.http would not work.

But suppose the remote endpoint is down anyway. My client on the current instance tries to connect, sends out a SYN packet, but it gets an RST (reset) packet. The whole exchange takes less than a second.

If I go on Kibana and search for this flow by dest.ip, it does get recorded, but it's broken in several ways.

  1. Packetbeat does not seem to realize that the transaction is done. The RST packet was sent back in under 1 second, but packetbeat keeps following the "flow" until it hits the expiration time. This is wrong. An RST packet means the transaction is done, stop following it.

  2. There is no way to distinguish between sessions that connect successfully (SYN, SYN/ACK, ACK, data exchange, FIN, FIN/ACK, FIN, FIN/ACK), and sessions that fail (are terminated with RST, or with an ICMP port unreachable or something like that).

This greatly reduces the usefulness of packetbeat when troubleshooting connection failures. It seems like packetbeat was written with the higher layers in mind, but at lower layers it is not quite usable. What I'm saying is, a network engineer would find packetbeat lacking in a number of ways, some of which are listed above. It's great when things work well, but it's less useful when there are failures below application level.

What would be VERY useful is something akin to Follow TCP Stream in Wireshark. If that is not possible, at least indicate somehow the TCP flows that failed to connect, and make sure to catch the RST packets, and ICMP packets, that terminate a flow early. And indicate on the flow the specific way in which it has failed (e.g. "RST received").


(Steffen Siering) #2

The flows feature for monitoring lower level layers in packetbeat is relatively new and currently doesn't really take any layers into account. It only counts packets and bytes. What's currently missing in flows is awareness of protocols (any layer). For example on TCP level summarise TCP flag usage and (as you noted) don't time out the flow yet if TCP connection is still active (but indicate TCP connection termination reason). I also hope to collect application layer stats (e.g. # successfull/failed transactions) within flows.

Currently we don't have a ticket for flow enhancements. Maybe you want to share some more ideas before we create a ticket?


(Florin Andrei) #3

Packetbeat already works well enough when there are no failures. The HTTP traffic dissector is pretty powerful already. Flows are recorded correctly - provided there's no failure. What's missing is treating failures in a useful way.

Make the flows feature aware of failures. If the client gets an RST packet back from server (or an ICMP port unreachable, host unreachable, etc), that's it, game over, terminate the flow, don't wait for timeout.

In the last event in the flow, you should clearly indicate how the flow ended. Was it terminated normally? (FIN - FIN/ACK) Or was termination forced? And if it was forced, how exactly? (RST, or ICMP, etc.) I want to be able to go on Kibana and search for "all outbound connection attempts to address X.Y.Z.K on port 443 that failed to connect for whatever reason".

Some TCP sessions connect, exchange some data, then the connection is terminated forcefully. Others try to connect, but are rejected before any data is exchanged. Yet others connect, exchange data, and are terminated gracefully. I want to be able to distinguish these categories.

Some TCP connections end up in the half-open state. I'm not sure if there's a one size fits all treatment for that. Maybe wait until timeout and mark it as "half-open"?

Some firewalls have DROP rules. A SYN packet goes out, but there's no reply, ever. I want to see that in Packetbeat, and distinguish it from a rejection via RST or ICMP. It's also different from a session timing out after data exchange because here not even the 3-way handshake was completed; that's a meaningful aspect that should be recorded.

The Follow TCP Stream feature in Wireshark is fantastic. Perhaps it does not make sense to clone it in Packetbeat, but it should be used as a source of ideas.

Please let me know when you open a ticket, I'd like to subscribe to it.


(Florin Andrei) #4

Some of the things I've said above about TCP should also apply to UDP and ICMP.

You send a request, no matter what protocol, you expect a reply (DNS query / DNS reply, or ICMP echo request / ICMP echo reply) and the reply never comes, that should be noted, after some reasonable timeout.

Or you send a request, and an ICMP destination unreachable comes back, that should be noted, along with the specific details of the error (e.g. it was ICMP Port Unreachable). It doesn't matter that the protocol carrying the request is TCP or UDP.

In a nutshell, event correlation for error conditions.


(Steffen Siering) #5

Hi,

You send a request, no matter what protocol, you expect a reply (DNS query / DNS reply, or ICMP echo request / ICMP echo reply) and the reply never comes, that should be noted, after some reasonable timeout.

This is somewhat specific to the protocol analyzer. Some analyzer do report a request being timed out without response and others do not. Introducing a consistent (maybe configurable) manner of reporting timed out responses would be another feature request/bug report.

For TCP we can fortunately deal with connection status in the TCP layer itself (meaning ICMP unreachable messages), but for UDP it might be a little more tricky. For some protocols we would have to extend the plugins to support some unreachable callback in order to kill the transaction early. But this definitely sounds reasonable to me (plus, we don't have to keep state until timeout).

Sounds like to more enhancement requests to me. What do you think?

I created an issue collecting ideas/tasks for enhancing flows: https://github.com/elastic/beats/issues/3444

Thanks for your input. I've been thinking about some of these features for a while, but your input and finally writing down some enhancement requests is a great step forward.


(Florin Andrei) #6

I had a look at the issue you've opened on github and I like what I see there so far. Thank you.

Whether all this should be split into multiple enhancement requests - that's up to the Beats dev team. It's a lot of changes, so I'm sure it will take a while.


(Steffen Siering) #7

one more enhancement request. This should help with debugging lower level protocols.

(One day I will add ARP support).


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.