I've just installed packetbeats on our Google Compute Engine, and I have been watching a days worth of visualizations.
The visualization for "Errors vs successful transactions [Packetbeat] ECS" has a much higher error rate than I expected, around 40-60% errors. The majority of which are from process.name: systemd-resolved with two error.messages
Another query with the same DNS ID from this client was received so this query was closed without receiving a response
Response: received without an associated Query
My google-fu is failing me and I can find nothing with either of these two error messages.
Has anyone else noticed similar problems?
Is this something wrong with my configuration of packetbeats?
Or is this something I need to investigate systemd-resolved for?
Based on your description of the two errors I suspect that the the systemd-resolved is re-sending DNS requests (maybe if it doesn't receive a DNS response within some timeout). Packetbeat's DNS protocol analyzer has a pretty simple state machine. It receives a request and expects a response. In this case it gets a request, another request (that overwrites the first based on request ID), a DNS response, then another response to which Packetbeat no longer has a request to match it with since the previous response closed the state for that ID.
If you capture a PCAP trace of the DNS traffic on port 53 (sudo tcpdump -i <interface> -w dns-capture.pcap port 53) you'll be able to confirm this by looking at the request and response traffic in a tool like Wireshark. If the PCAP data does not prove this theory then it could some other issue like dropped packets in which case we can look at optimizing your configuration (like using af_packet as shown here and a few other things).
I think it would be possible to improve the state machine in Packetbeat's DNS protocol analyzer to avoid these errors by adding a counter to track the number of pending requests.
It doesn't look like its resending the query, I can see the response match up for every request.
It is however sequentially asking for resolution of the same name quite quickly, why it isn't caching these queries is another question.
Now, I think I have the same information displayed in Kibana.
Unfortunately I don't have the technical background to do anything about it.
I'm happy to try running debug versions of packetbeats to check whether it fixes it.
Our setup on GCP is relatively new, we are running in zone australia-southeast1-b on an n1-standard-2 (2 vCPUs, 7.5 GB memory) so I expect that spinning up a compute engine with the right OS will be able to reproduce the problem.
As I've got a workaround, running pcap, I'll live with that for now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.