Slow log file read from url for geoip


(Hans) #1

Hi All,

I am currently facing challenges with geoip processing for url. Does anyone maybe have any suggestions how to increase the processing speed? Here is the filter:
geoip {

  source => "url"

  target => "geoip"

  database => "/usr/share/GeoIP/GeoLiteCity.dat"

  add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]

  add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]

}

mutate {

  convert => [ "[geoip][coordinates]", "float"]

}

geoip { source => "url" }

    geoip {

        source => "url"

        target => "geoIPASN"

        database => "/usr/share/GeoIP/GeoIP.dat"

    }

}


(Magnus Bäck) #2

No obvious rooms for improve AFAICT. What event rate are you getting? How do you know it's these filters that are slowing things down?


(Hans) #3

When I take them out the file is pocessed by about 27K every few seconds. Is it possible that due to URL a DNS lookup is done for every record?


(Magnus Bäck) #4

Ah, right. Yes, if the source is a hostname then it'll take a DNS lookup to get an IP address to look up. Apart from making sure you have a fast caching DNS server you can increase the number of filter workers with the -w startup option.


(Hans) #5

Magnus, I have been trying to search for a solution how to do a initial url to IP lookup and then only use the IP address field for the rest of the filters. Do you have any suggestions?


(Magnus Bäck) #6

Have you looked at the dns filter?


(Hans) #7

With the GeoIP information I am not getting any information in the geoip.location field that seems to be populated by default. I have added the following as the field that contains the information is called url

geoip {
source => "url"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
}

The output is still

any suggestions why this field is not populated?


(Magnus Bäck) #8

It looks like you're trying to populate geoip.location with the contents of the LATITUDE and LONGITUDE fields but there are no such fields. If you show us your configuration we can help further.


(Hans) #9

Here is the full configuration:

input {
file {
type => "BIND_DNS"
path => [ "/data/bind" ]
start_position => "beginning"
}
}

filter {
grok {
match => ["message","(?%{MONTHDAY}-%{MONTH}-%{YEAR} %{TIME}) queries: info: client %{IPORHOST:clientip}#%{NUMBER:port}: query: (?[a-z0-9-]+.[a-z0-9-]+\S+) IN %{WORD:recType} + (%{IPORHOST:DNSIP})"]
}

dns {
add_field => [ "URL", "FQDN" ]
}
dns {
resolve => [ "URL" ]
action => [ "replace" ]
}

geoip {
source => "url"
target => "geoip"
database => "/usr/share/GeoIP/GeoLiteCity.dat"

add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]

add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]

}

mutate {

convert => [ "[geoip][coordinates]", "float"]

}

geoip { source => "geoip.ip" }
geoip {
source => "geoip.ip"
target => "geoIPASN"
database => "/usr/share/GeoIP/GeoIP.dat"
}
}

output {
elasticsearch {
protocol => "node"
host => "localhost"
cluster => "elasticsearch"
}
}


(Magnus Bäck) #10

Is this really the only configuration you have? No other files in /etc/logstash/conf.d that you're forgetting about? I'm asking because I'm pretty sure none of the standard plugins attempt to reference any LATITUDE or LONGITUDE fields via the %{fieldname} notation.


(Hans) #11

No only one single file in that directory with the configuration provided?


(system) #12