DNS filter performance issue?

yuphing · June 12, 2015, 8:05am

I'm trying to use the DNS filter to do reverse DNS lookups for various IPs in netflow data, like so:

dns {
  reverse => [ "ipv4_src_hostname", "ipv4_dst_hostname" ]
  action => "replace"
}

The filter is working i.e. ipv4_(src|dst)_hostname, which initally contains the IP addresses, are replaced by the appropriate hostnames if available. However, my redis list, which is fed by a local logstash instance, and then consumed by the 4 separate logstash nodes+indexers, starts growing (normally llen reports 0), and never reduces back to 0.

If I remove this filter, the list dramatically is consumed back to 0 again.

I've tried the following to see whether I could attenuate this behaviour:

increase logstash worker threads -w 2 or 4 (default is 1)
installed dnsmasq with:

cache-size=6000
dns-forward-max=600
all-servers

doing a host -v resolves to hostname from 127.0.0.1#53 in 0 ms on average.

in case it was an issue resolving external IPs, I tried only do a reverse lookup for RFC1918 i.e.

if [ipv4_src_addr] =~ "(^127.0.0.1)|(^10.)|(^172.1[6-9].)|(^172.2[0-9].)|(^172.3[0-1].)|(^192.168.)" {
dns {
reverse => [ "ipv4_src_hostname" ]
action => "replace"
}
}

All to no avail.

Is there a known issue with the DNS filter doing reverse DNS lookups and having poor performance, or am I seriously doing something wrong?

[EDIT] I am using logstash 1.5 and elasticsearch 1.6. My architecture is:

netflow producers -> (logstash -> redis) -> logstash*4 -> elasticsearch*4

Joshua_Rich · June 15, 2015, 6:34am

The slowness is mentioned in the filter documentation. With netflow data, I'd recommend you avoid doing DNS lookups in Logstash, store the raw IP and only do the lookups when you are analysing the data on the filtered dataset you are looking at.

yuphing · June 15, 2015, 7:18am

Thanks Joshua. After more testing over the weekend, DNS lookup for local hosts is working fast enough for me to use that portion, so I will stick with that for now, and preclude external hosts.

Perhaps a batch process that queries ES for non-resolved hostnames and updates them once/day might be the way to go for these external hostnames, which I will assume may not change that often.

Norberto_Meijome · June 15, 2015, 7:36am

You could always process these log lines before making available to
Logstash...though a scrubbing process which changes IPs to fqdn (or even
better, updates a separate field) might be better, as you'd have the data
more readily available at first.

Joshua_Rich · June 16, 2015, 2:26am

Yeah, pre or post-processing of the data to do the DNS lookups sounds like the best option here.

yuphing · June 18, 2015, 8:35am

Hmm, to pre-process, it'll have to be somewhere between input -> consumption in my pipeline, marked #[1-3] in the 3 places below:
netflow producers -> (logstash #1-> redis #2) #3-> logstash4 -> elasticsearch4

Ideally it would be logstash doing that work, since it's already reading it either at #1 or #3, but the reverse DNS lookup is very slow, and I'm uncertain if that's a logstash problem, or simply just the lookup latency. As a point to note, when I changed the DNS lookup to be at #1 in the pipeline, the number of netflow packets captured was only ~20% of expected, so I can guesstimate the slowdown to be ~5x (i.e. about 80% of packets ended up being dropped at the UDP input phase).

Which means likely I need to batch the operation on the redis side; simplistically 2 lists, e.g.
logstash:redis:rawips
logstash:redis:hostnamesresolved

and a script to read the logstash:redis:rawips list for each unique raw IP in turn and reindex all associated items into the logstash:redis:hostnamesresolved list.

My head hurts; any volunteers?

Topic		Replies	Views
Having issue with DNS filter Logstash	8	2445	February 14, 2018
Filter "logstash-filter-dns" doesn't work Logstash	1	452	August 10, 2018
Odd DNS filter behaviour prevents event processing Logstash	1	392	February 9, 2018
Logstash DNS Filter Issue Logstash	1	421	April 10, 2018
DNS filter couldn't perform reverse lookup Logstash	4	1150	June 1, 2020

DNS filter performance issue?

Related topics