Hi all,
I'm trying to get grok to perform better for our firewall logs. Unfortunately, they are ASA firewalls, which means that there are dozens of patterns that every message needs to be matched to - even though the messages themselves contain a tag ('ciscotag') that uniquely maps to one specific pattern. These firewalls produce about 15000 events/s, so we need to be as efficient as possible.
In order to avoid building a monstrous if-then-else construct in the logstash config file I came up with the idea of using the translate plugin to match the 'ciscotag' to one of the patterns present in the firewall file in the patterns directory.
Relevant config bit looks like this:
if [message] =~ "ASA|FWSM|PIX" {
# First grok in order to fill the "ciscotag" field so we can use it
grok {
patterns_dir => [ "./patterns" ]
match => [ "message", "%{CISCO_TAGGED_SYSLOG} %{GREEDYDATA:ul_message}" ]
}
# Use a dictionary lookup to get the required pattern
translate {
field => "ciscotag"
destination => "fwpattern"
dictionary_path => "./dictionary/fw_patterns.yml"
}
# Extract fields from each of the detailed message types using the
# pattern determined above
grok {
patterns_dir => [ "./patterns" ]
match => [ "ul_message", "[fwpattern]" ]
}
(I've tried various forms of quoting etc (like "%{fwpattern}"), but haven't stumbled upon the right incantation if this is at all possible. The pattern returned by the dictionary lookup is the correct one in all cases I've seen.)
Though this results in good throughput of about 7800 msg/s on our machine, the resulting output is NOT processed at all! What I get in elasticsearch is a timestamp, some fields filled by the first grok, and the original message.
Going back to the monstrous if-then-else statement results in about 4000msg/s that are processed correctly, but this leads to an ugly, large and relatively dynamic config file - never mind the fact that it's performing far worse, if that isn't an artifact of not grok-ing correctly.
Does anyone know how to do this properly, if it is even possible?