Using translate plugin to speed up grok?

Hi all,

I'm trying to get grok to perform better for our firewall logs. Unfortunately, they are ASA firewalls, which means that there are dozens of patterns that every message needs to be matched to - even though the messages themselves contain a tag ('ciscotag') that uniquely maps to one specific pattern. These firewalls produce about 15000 events/s, so we need to be as efficient as possible.

In order to avoid building a monstrous if-then-else construct in the logstash config file I came up with the idea of using the translate plugin to match the 'ciscotag' to one of the patterns present in the firewall file in the patterns directory.

Relevant config bit looks like this:

    if [message] =~ "ASA|FWSM|PIX" {

            # First grok in order to fill the "ciscotag" field so we can use it
            grok {
                    patterns_dir => [ "./patterns" ]
                    match => [ "message", "%{CISCO_TAGGED_SYSLOG} %{GREEDYDATA:ul_message}" ]
            }

            # Use a dictionary lookup to get the required pattern
            translate {
                    field => "ciscotag"
                    destination => "fwpattern"
                    dictionary_path => "./dictionary/fw_patterns.yml"
            }

            # Extract fields from each of the detailed message types using the
            # pattern determined above
            grok {
                    patterns_dir => [ "./patterns" ]
                    match => [ "ul_message", "[fwpattern]" ]
            }

(I've tried various forms of quoting etc (like "%{fwpattern}"), but haven't stumbled upon the right incantation if this is at all possible. The pattern returned by the dictionary lookup is the correct one in all cases I've seen.)

Though this results in good throughput of about 7800 msg/s on our machine, the resulting output is NOT processed at all! What I get in elasticsearch is a timestamp, some fields filled by the first grok, and the original message.

Going back to the monstrous if-then-else statement results in about 4000msg/s that are processed correctly, but this leads to an ugly, large and relatively dynamic config file - never mind the fact that it's performing far worse, if that isn't an artifact of not grok-ing correctly.

Does anyone know how to do this properly, if it is even possible?

The only reason you are seeing "good" performance is that it's not doing anything :frowning:

I dunno if this will work as grok will probably try to look that pattern up in the directory you specified, not using the pattern you pulled from the translate plugin.

This blog post might help with the other method though - http://ghost.frodux.in/logstash-grok-speeds/

I'm not sure if its possible to use a value of a field as a option for a filter like that.

But I love the translate filter, it is one of the most useful and underrated filter there is. For example I had a text file some megabytes big that had IP reputation information in it and had the translate filter match incoming firewall logs if the IP in the log was a part of the reputation list.

In your case I would still go with the big if then else construct but I would split it into multiple config files. First one would be called 01start.conf that would do minimal grok or regex to categorize and tag each event. Then add whatever config files you need that would start at 02firewall.conf. Each config would start with a if tag is something do specific grok. End it with a general 99output.conf or have each specific config file handle the output part.

Agreed :slight_smile:

Thanks for the reply :slight_smile:

I've seen that link, but thought that since we're already getting 4000msg/s it couldn't really be relevant since that is talking about going from 50/s to 600/s, but I'll revisit it and actually do some tests.

Thanks for replying :slightly_smiling:

Reworking the config file as you suggest was indeed one of the things I was going to do to make some things easier to manage.

Still, being able to just look up the appropriate pattern to use is quite elegant, IMHO, and makes for more concise config files, which is always nice.

Plus you can modify the external translate dictionary without having to reload/restart Logstash.

Plus plus you can use the fields created with the translate field can be used in conditionals. So you can automate how the translate dictionary looks and therefor automate how events flow through the logstash pipeline.

Elastic should have the translate filter as a default plugin, it is that good, it is that powerful.

Agreed in full :slight_smile: