DNS slower with cache enabled?

Hi,

I'm working on a logstash installation which feeds log messages from various sources (syslog, Windows, Checkpoint firewalls etc.) into Elasticsearch for analysis and monitoring. Everything was working nicely until I added a DNS filter to resolve domain names to IP addresses and vice versa:

...
if [SI_Source_IP] =~ /^\d+\.\d+\.\d+\.\d+/ {
  dns {
    reverse => [ "SI_Source_FQDN" ]
    action => "replace"
  }
}
else {
  dns {
    resolve => [ "SI_Source_IP" ]
    action => "replace"
  }
}
...

(SI_Source_FQDN and SI_Source_IP contain the same value before this fragment executes, either a FQDN or an IP address). Similar code deals with the Destination_IP/FQDN.

I've got four filter workers and four output workers (any more than this didn't seem to increase performance.)

Without any DNS lookup I was getting a throughput of 4000 messages/s. With the DNS code added this drops to around 1000/s. But what really surprised me was that adding caching reduces the DNS performance, to about 600 messages/s:

...
hit_cache_size => 8000
hit_cache_ttl => 300
failed_cache_size => 1000
failed_cache_ttl => 10
...

Logstash is running on two VMs, each with two virtual CPU cores and 4GB RAM. According to the customer they are "well resourced" and the current load is not topping out the CPUs.

Am I making any stupid mistakes? Is there anything I could do to improve performance? Why is adding caching making things worse rather than better?

cheers,
Tom

I've just noticed that the threadsafe cache class used in the plugin will only allow one worker at a time to call out to DNS (unless my understanding of Ruby synchronisation is way off.) So I'm going to fork the plugin and modify it to allow concurrent lookups.

Solution here -

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.