Logstash Grok Filter failure with match

Unfortunately I'm stuck with some Logstash basics. I want to use Logstash to get logs from a Sophos UTM and do something with the data. I already tried different things from the documentation, only with little success tagging is working.

Message string:

<30>2019:10:22-11:11:37 sophos-utm httpproxy[18718]: id="0003" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="CONNECT" srcip="10.0.0.10" dstip="" user="dummy" group="" ad_domain="DOM" statuscode="407" cached="0" profile="REF_HttProContaInterNetwo (Connections DOM)" filteraction=" ()" size="2487" request="0xd95a2700" url="https://api1.origin.com/" referer="" error="" authtime="62" dnstime="0" aptptime="0" cattime="0" avscantime="0" fullreqtime="968" device="0" auth="2" ua="Mozilla/5.0 EA Download Manager Origin/10.5.50.31938" exceptions=""

My testing pattern configuration:

SOPHOS_MODULE >.*\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}\s?\s\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)\s%{WORD}
SOPHOS_SRCIP srcip=\"%{IPV4}\"

Do i need to use the

""

or not? Because the json output (kibana) contains it with backslash in the message string.

My logstash filter (inputs / outputs are separated and working):

filter {
    grok {
        match => {"message" => "(?<date>\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})\s%{HOSTNAME:hostname}\s.*\[(?<ID>\d+)"}
        add_tag => ['vendor_sophos', 'catecory_firewall']
    }

    if "vendor_sophos" in [tags]
    {

         grok{
             patterns_dir => "/etc/logstash/patterns"
             match => {
                         "message" =>"%{SOPHOS_MODULE:sophos_utm_module} %{SOPHOS_SRCIP:sophos_httpd_sccip}"

             }
             add_tag => ['filter_works']
         }
    }
}

If I try to run this example I got the result tagged as "_grokparsefailure". If I change it a little bit to just:

"message" =>"%{SOPHOS_MODULE:sophos_utm_module}"

The grokparsefailure is gone and the tag "filter_works" is set correctly. But the sophos_utm_module field has incorrect values

"sophos_utm_module": ">2019:10:22-11:13:25 sophos-utm httpproxy",

It should be just "httpproxy", tested well with http://grokdebug.herokuapp.com/

So one problem is, the filter is not working properly (one match ok, two matches are failing) and the second problem is the value of the field.

Thanks in advance for hints on how to solve this.

Stop using grok. You could dissect that.

    dissect { mapping => { "message" => "<%{pri}>%{[@metadata][timestamp]} %{hostname} %{program}[%{pid}]: %{[@metadata][restOfLine]}" } }
    date { match => [ "[@metadata][timestamp]", "YYYY:MM:dd-HH:mm:ss" ] }
    kv { source => "[@metadata][restOfLine]" }

First thanks for your input.

Do you thing dissect is the best way to archieve the goal? Because the log line changes from time to time with different sub (apache, firewall, reverse proxy for publishing, captive portal, VPNs, ...) Nearly every sub has it own log line (with various parameters).

The documentation from elastic mention that dissect and grok can be used together for this case. Maybe you are familiar with that?

For the moment i have a mulitple match via grok running, the problem was a needed restart of logstash, the automatic reload was not enough.

Yes. If all of your log lines have a common prefix (like "<30>2019:10:22-11:11:37 sophos-utm httpproxy[18718]: ") then you can use dissect to pick that apart, and then go after the rest of the message (what I called [@metadata][restOfLine]) with grok or other filters. So basically what I posted demonstrates exactly that.

Thank you again for this, this works for me.

One last question:
How to you differ the sources/endpoints which are sending logs to logstash with one input configured (e.g. 514 syslog). The endpoints (cisco,hp,utm,pfsense, ....) sends their log data so what is a good practice to tag the logs and filter it afterwards? In different documentations there are examples for one single input with a single filter.

syslog messages will include a hostname or IP address, so as a first level of tagging you could use a translate filter to map the hostname to a system type. However, syslog data is basically unstructured, so you will end up doing a lot of pattern matching even once you know the system type.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.