DNS Filter Lookup Failures since 6.7 upgrade

millap · March 29, 2019, 4:30pm

Hi Guys,

Hoping someone can offer some guidance. Since our LS upgraded to 6.7.0, we've consistently seen issues with LS resolving DNS entries which appears to build over time. E.g -

[logstash.filters.dns ] DNS: timeout on resolving address. {:field=>"[destination-address]", :value=>"x.x.x.x"}

It's normal for us to see 'some' addresses which are unresolvable, but what appears to be happening is over time (say 1-2 hours) LS will log more and more problems until eventually it stops processing ingested messages causing ES/Kibana dashboards to miss data.

If we restart the LS service, it's fine again, but only for a few hours. It doesn't matter whether it's the middle of peak period, or in the middle of the night.

As I said, before the upgrade it was fine, and locally on the LS server, we can run nslookup/dig to resolve addresses just-fine.

We're ingesting logs from various sources - Bro IDS and Blue Coat via rsyslog or filebeat, along with Juniper, Meru, Pulse Secure logs via Syslog

Any ideas? Anyone else having a similar issue?

There's no error logged for this by LS. It 'just' stops processing until restart. With Metricbeat and Filebeat, thats fine, but with syslog, we end up missing data.

Cheers
Andy

millap · March 29, 2019, 9:57pm

Hi All,

For evidence -

[2019-03-29T21:51:34,061][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. 
{:field=>"[destination-address]", :value=>"10.250.37.62"}
root@logstash00:/var/log/logstash# nslookup 10.250.37.62
Server:		10.250.91.67
Address:	10.250.91.67#53

62.37.250.10.in-addr.arpa	name = sar-xxxx.xxxxx.

Andy

jahlives · April 10, 2019, 1:18pm

We run into the same problem after updating to ELK 6.7
About one hour after restarting logstash we get the same error messages for (it looks like) every logline processed by logstash. So we're seeing quite huge delays. The only thing that works so far is to disable dns ptr lookups completly in our filters.
After restarting logstash all runs smooth for a while. We first though that the issue might be this https://github.com/logstash-plugins/logstash-filter-dns/issues/40 but it seems our resolv.rb is already correctly patched

piton · April 17, 2019, 10:23am

Got the same issue after 6.7.0 update.
6.7.1 update didn't fix the issue.

[2019-04-17T10:18:20,265][WARN ][logstash.filters.dns ] DNS: timeout on resolving address. {:field=>"host", :value=>"172.23.6.9"}

root@syslog-core01:/var/log# host 172.23.6.9
9.6.23.172.in-addr.arpa domain name pointer host2.core.example.com.

logstash-filter-dns plugin version: 3.0.12

piton · April 22, 2019, 4:06pm

Even 7.0.0 update didn't fix it.

Any suggestions/idea?

colinsurprenant · April 25, 2019, 3:56am

Heads up to say that I believe I just found the cause of this regression and it relates to a library update to the resolver code. I will be following up shortly but for now the only workaround is to downgrade to the latest 6.6 series (6.6.2 as of now). Just downgrading the dns filter plugin version will not help.

christian_ha · April 25, 2019, 6:28am

Same Problem here with LS Version 6.7.1. We planed a elastic upgrade to 7.0 in production today but after this problem we would not do that.

colinsurprenant · April 25, 2019, 3:34pm

Yes, this affects all versions starting at and after v6.7, including v7 and the only workaround for now to downgrade to v6.

Progress can be tracked in https://github.com/logstash-plugins/logstash-filter-dns/issues/51

colinsurprenant · April 25, 2019, 6:39pm

I may have a temporary workaround I'd like to validate, any feedback appreciated.

If you are using the nameserver option of the dns filter with multiple hosts configured OR you are not using the nameserver option but have multiple servers configured in /etc/resolv.conf

then as a tentative temporary workaround you can try to use a SINGLE server in either the nameserver option or in your /etc/resolv.conf and see if it solves.

millap · April 25, 2019, 7:37pm

Hi @colinsurprenant,

Thanks for the updates, and confirmation there's a problem to fix.

I've applied your logic to our test cluster, and am waiting for some log entries. We have multiple entires inn resolv.conf, so I've switched the LS conf files to use only one on nameserver.

I have had some, but no-where near as many so far as before. Probably needs a couple of hours though.

One oddity is this, however -

[2019-04-25T20:19:43,370][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[destination-address]", :value=>"209.112.114.33"}

 root@elk00:~# nslookup
> 209.112.114.33
;; Truncated, retrying in TCP mode.
33.114.112.209.in-addr.arpa	name = k4.nstld.com.
33.114.112.209.in-addr.arpa	name = l4.nstld.com.
33.114.112.209.in-addr.arpa	name = a22.verisigndns.com.
33.114.112.209.in-addr.arpa	name = f4.nstld.com.
33.114.112.209.in-addr.arpa	name = ns2.euro909.com.
33.114.112.209.in-addr.arpa	name = a23.verisigndns.com.
33.114.112.209.in-addr.arpa	name = ns0.netnames.net.
33.114.112.209.in-addr.arpa	name = ns1.netnames.net.
33.114.112.209.in-addr.arpa	name = ns1.ascio.net.
33.114.112.209.in-addr.arpa	name = ns2.domainnetwork.se.
33.114.112.209.in-addr.arpa	name = ns3.ascio.net.
33.114.112.209.in-addr.arpa	name = ns2.dnsvisa.com.
33.114.112.209.in-addr.arpa	name = g4.nstld.com.
33.114.112.209.in-addr.arpa	name = a21.verisigndns.com.
33.114.112.209.in-addr.arpa	name = ns2.webipdns.com.au.
33.114.112.209.in-addr.arpa	name = ns3.netnames.net.
33.114.112.209.in-addr.arpa	name = ns5.netnames.net.
33.114.112.209.in-addr.arpa	name = a2.verisigndns.com.
33.114.112.209.in-addr.arpa	name = pdns1.cscdns.net.
33.114.112.209.in-addr.arpa	name = indom30.indomco.fr.
33.114.112.209.in-addr.arpa	name = ns7.netnames.net.
33.114.112.209.in-addr.arpa	name = indom10.indomco.com.
33.114.112.209.in-addr.arpa	name = dns1.cscdns.net.
33.114.112.209.in-addr.arpa	name = indom130.indomco.org.

Not sure if the switch to TCP might have caused the filter a problem?

Andy

colinsurprenant · April 25, 2019, 7:48pm

Thanks @millap for the followup. The timeout on 209.112.114.33 seems unrelated. When this bug is triggered, most if not all requests will timeout, occasional timeout is normal.

colinsurprenant · April 25, 2019, 9:31pm

Incoming fix in https://github.com/logstash-plugins/logstash-filter-dns/pull/52

colinsurprenant · April 29, 2019, 7:55pm

The dns filter v3.0.13 has just been published to address this issue. You can update this plugins using

bin/logstash-plugin update logstash-filter-dns

Please let us know how this new version is working for you.
Colin

christian_ha · May 2, 2019, 5:57am

After updating to v3.0.13 on 30th of april the problem did not occur anymore. I am going to monitor this further.

system · May 30, 2019, 5:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DNS Plugin Logs Logstash	1	417	March 7, 2017
Odd DNS filter behaviour prevents event processing Logstash	1	392	February 9, 2018
Logstash_Dns timeout issue Logstash	1	885	November 14, 2017
Logstash crashing on DNS connection timeout Logstash	1	749	January 20, 2019
Change Logging Level of Filter Logstash	4	442	May 27, 2019

DNS Filter Lookup Failures since 6.7 upgrade

Related topics