DNS Protocol and reading PCAP files

(Brian Kithen) #1


I'm currently working in a company where we intend to use PacketBeat with ELK. It will initially only deal with DNS packets.
We don't have any real experience with ELK, so its kind of a challenge. I'm testing the latest version of PacketBeat that supports DNS over UDP.
The live capture mode works pretty good, except maybe for a behaviour I'm not sure is expected: whenever packets are replayed by a client wishing to resolve a domain name, ElasticSearch database only contains a few of them.

For example, a client ( queries to resolve a domain name :

dig +tries=3 @ google.com
;; connection timed out; no servers could be reached

Since is not a DNS Server, it is not listening on port 53 and that is why DNS packets are replayed 3 times by the client. 'replayed' means the packets stay exactely the same (as is the DNS ID) for all the requests. However, duplicate packets can also happen on any real DNS Server listening on port 53 and we want to make sure that every packet are sent to ElasticSearch, even if their are duplicates.

On, tcpdump shows that the 3 requests are received:

15:59:27.821199 IP > 53727+ A? google.com. (28)
15:59:32.820852 IP > 53727+ A? google.com. (28)
15:59:37.820623 IP > 53727+ A? google.com. (28)

The problem is that only 2 of these requests appear in ElasticSearch.

It is worst when reading PCAP, because then only 1 packet of the previous dig command appears in ElasticSearch.

Finally, I would have two last questions :
Is it normal that when reading a PCAP file, packets data sent to ElasticSearch have a timestamp not equal to the timestamp in the PCAP file but takes its value from when PacketBeat sent the JSON formated data to ElasticSearch ?
We would like to have the original packets information in ElasticSearch.

Also, the '-t' option for reading PCAP doesn't put any packet in ElasticSearch even though PacketBeat prints 'dns.go:368: DBG Publishing transaction. DnsTuple src[] dst[] transport[udp] id[24503]'. I tried the '-t' option with another protocol (http on port 80) but the result stays the same: no packets appeared in ElasticSearch.

Content of the PacketBeat config file :

    refresh_topology_freq: 10
    topology_expire: 15

    device: eth0
    type: pcap

        ports: [53]

        enabled: true
        port: 9200
        save_topology: true

With thanks in advance

(Steffen Siering) #2
  1. duplicate handling:
    If a duplicate request (without) response is detected, the current transaction (containing request only) can be published. The last seen request will be correlated with the response (if request is send before response is received).
    The published message will contain a note saying "Another query with the same DNS ID from this client..."
    Have you tried packetbeat with '-dump'? This option will write a pcap of packets seen by packetbeat.
    Optionally add options '-v -d "udp,dns,publish"' to packetbeat to get some debug output.

  2. pcap timestamps:
    timestamps are generated by underlying sniffer. for pcap it should be libpcap. But I need some more time to figure why/when which timestamps are used for pcaps. If you run packetbet with --dump the generated pcap report the timestamps generated by sniffer module.

  3. no output with '-t' (or only 1 when replying pcap)
    Can you add elasticsearch debug output via '-d "elasticsearch,output_elasticsearch".

(Andrew Kroh) #3

@Brian_KITHEN, the issue you describe about only seeing 2 of 3 DNS requests logged in Elasticsearch is caused by the fact that the 3rd request is being timed-out and not reported. After 10 seconds, if no DNS response is received, the request is silently timed-out.

I have created a pull request to address this issue.

(Brian Kithen) #4

Thank you for those answers !
I built PacketBeat with @andrewkroh updates and the live capture works like a charm.

I'm concerned about reading PCAP because that's what we plan to use for production. I did some more testing and found out a part of the issue. First, I tried to integrate some of your PCAP files.
Execution of the command packetbeat -e -d "*" -I tests/pcaps/dns_additional.pcap tells that the packets are published. Yet, ElasticSearch stays empty. TcpDump shows that the last packet sent by packetbeat is partial. It seems that it is due to the program exiting too soon without letting the TCP connection end.
That explains how at least one packet is lost.

In the case of the PCAP of the 3 packets from the dig command given earlier, the first query is processed but not published (according to PacketBeat debug mode), and the second and third are published. The third (i.e. the last one) appears partial in TcpDump.
That is why only one out of the three are in ElasticSearch but I don't know why the first one isn't published though. Any idea ?

Btw, the '-t' option works on large PCAP files. If the PCAP contains only a few packets, they are all affected by the program exiting too soon. In a large PCAP, the last ones are still affected.

(Steffen Siering) #5

Oh, right. So the publisher is asynchronous. Plus for elasticsearch output the transactions are buffered for some time in order to use the bulk API for improved throughput. When you replay just one (short) pcap file transactions to be published might still be buffered and not written yet. This one hit me during testing too. I introduced another options '-waitstop=num seconds' to packetbeat. That is packetbeat will wait "num seconds" before quitting. Alternatively you can try to configure the elasticsearch output module:

At the moment packetbeat is mostly designed for live traffic capture. What exactly do you want to achieve? Maybe we can find a better solution.

No idea why first one is not published when replaying from pcap. You've got some debug log output to share with us?

edit: formatting.

(Brian Kithen) #6

Thanks for the reply.

We capture packets with ULOG and iptables. Packets are saved in PCAP files. There are multiple PCAP files and each one categorise the packets. For example, if the packet has a blacklisted IP src, we save it in the blacklisted.pcap file. There are multiple rules like that.
PCAP files would then be synced and parsed with PacketBeat on another server.

We want to avoid live capture (and its not my call, I suggested otherwise) to keep those rules and avoid more CPU usage on a production server.

Sure, I uploaded dns_dig_capture.pcap and a log of the execution of packetbeat -waitstop 3 -t -e -d "publish,dns,elasticsearch,output_elasticsearch" -I dig_capture.pcap.

Parsing of the first packet displays:

dns.go:380: DBG Publishing transaction. DnsTuple src[] dst[] transport[udp] id[15947]
client.go:58: DBG send event
async.go:71: DBG send async event

But then no debug message is printed from publish.go for this packet.

(Steffen Siering) #7

So you're basically using packetbeat for additional analysis of captured traffic. Security use-case?

Problem is packetbeat has internal timers for timeout based on system timers. When replaying a pcap with -t your results will be skewed due to timers and packet processing basically working on another time scale. I've got ideas to fix this, but this one is a bigger change.

Plus: when do you replay your pcaps? Are these pcaps rotated somehow? Due to replaying you create some kind of windowing effects (window is first until last packet in pcap). This windowing can break your analysis due to transactions starting in window1 and ending in window2.

Is there an option to somehow forward packets to packetbeat from within your packet filters? I can think of a sniffer module in packetbeat accepting packets e.g. via TCP so you can just forward your pcaps content to a running packetbeat instance. With the filtering you've in place you can run one packetbeat per category for example. Alternatively (instead of one packetbeat per category) some packet tagging/coloring would be great. The tags added to a packet will be published to elasticsearch then. Just some ideas.

For these reasons - 1. timers and 2. incomplete windowing - I would not recommend to work with pcap files in production. The first one I can fix in code. The second one is introduced by your processing, but can be worked around by implementing the solution for 1 and use packet forwarding as described earlier to an always running packetbeat instance.

I haven't had a chance to look at the debug output yet. Will do later.

(Brian Kithen) #8


New PCAP files are saved hourly. We saved months of PCAP that ideally we would like to include in ElasticSearch.
If we were to loose transactions because of the windowing problem, that wouln'd be too much of an issue for us, mainly because most requests are UDP DNS 1 request - 1 answer.

I suggested replacing PCAP files with forwarding packets to local loopbacks where PacketBeat would be listening. My superior didn't agree to it because it would require editing complex firewalls and the increase of CPU usage would probably be a problem.

If we were to replay packets (e.g. with tcpreplay) from PCAP files to the local loopback, it would work except that the timestamps wouln'd be preserved (right ?).

About the internal timers problem with PCAP, I would be ready to give any help, even if I never programmed using Golang before. How long do you think it would take ?

(Steffen Siering) #9

When you care about timestamps I'm not a big fan of tcpreplay:
If you don't have much traffic in pcap it's okish, but if you got to many packets in short intervals tcpreplay add quite some skew breaking your timestamps. tcpreplay tries to adjust the sending rates, but from my experience with tcpreplay it is not optimal. But in general replaying into loopback will mess up timestamps anyway, as kernel assigns new timestamps for you.

Internal timestamp problem is not as simple (at least for the solution I envision). We would need a timer basically driven by packet timestamps. On top of this timer timer a timer system for doing timeouts is required. Plus the protocol analysers are currently driven by the sniffer, so in order to drive your timer if no packets are received (general use-case, but with your use case you want to disable driven the timer without packets), you either need to integrate the timer system with your sniffer (using timeouts on poll for example) or forward all packets (some batching would be better or performance though) to a worker loop driving the clock and protocol analysers.

I haven't had a chance to look into the pcap file timestamping yet, but maybe for working with pcaps a custom pcap sniffer (actually very easy to do) can be helpful doing the timestamping of packets (possible modes: system time, pcap packet time, pcap packet time synced to system time). The looping and '-t' options should be mostly handled by the sniffer type.

It's all doable, but will take some time to implement.

(system) #10