Logstash Grok Filter failure with match

fuXz · October 22, 2019, 9:41am

Unfortunately I'm stuck with some Logstash basics. I want to use Logstash to get logs from a Sophos UTM and do something with the data. I already tried different things from the documentation, only with little success tagging is working.

Message string:

<30>2019:10:22-11:11:37 sophos-utm httpproxy[18718]: id="0003" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="CONNECT" srcip="10.0.0.10" dstip="" user="dummy" group="" ad_domain="DOM" statuscode="407" cached="0" profile="REF_HttProContaInterNetwo (Connections DOM)" filteraction=" ()" size="2487" request="0xd95a2700" url="https://api1.origin.com/" referer="" error="" authtime="62" dnstime="0" aptptime="0" cattime="0" avscantime="0" fullreqtime="968" device="0" auth="2" ua="Mozilla/5.0 EA Download Manager Origin/10.5.50.31938" exceptions=""

My testing pattern configuration:

SOPHOS_MODULE >.*\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2}\s?\s\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)\s%{WORD}
SOPHOS_SRCIP srcip=\"%{IPV4}\"

Do i need to use the

""

or not? Because the json output (kibana) contains it with backslash in the message string.

My logstash filter (inputs / outputs are separated and working):

filter {
    grok {
        match => {"message" => "(?<date>\d{4}:\d{2}:\d{2}-\d{2}:\d{2}:\d{2})\s%{HOSTNAME:hostname}\s.*\[(?<ID>\d+)"}
        add_tag => ['vendor_sophos', 'catecory_firewall']
    }

    if "vendor_sophos" in [tags]
    {

         grok{
             patterns_dir => "/etc/logstash/patterns"
             match => {
                         "message" =>"%{SOPHOS_MODULE:sophos_utm_module} %{SOPHOS_SRCIP:sophos_httpd_sccip}"

             }
             add_tag => ['filter_works']
         }
    }
}

If I try to run this example I got the result tagged as "_grokparsefailure". If I change it a little bit to just:

"message" =>"%{SOPHOS_MODULE:sophos_utm_module}"

The grokparsefailure is gone and the tag "filter_works" is set correctly. But the sophos_utm_module field has incorrect values

"sophos_utm_module": ">2019:10:22-11:13:25 sophos-utm httpproxy",

It should be just "httpproxy", tested well with http://grokdebug.herokuapp.com/

So one problem is, the filter is not working properly (one match ok, two matches are failing) and the second problem is the value of the field.

Thanks in advance for hints on how to solve this.

Badger · October 22, 2019, 4:05pm

Stop using grok. You could dissect that.

    dissect { mapping => { "message" => "<%{pri}>%{[@metadata][timestamp]} %{hostname} %{program}[%{pid}]: %{[@metadata][restOfLine]}" } }
    date { match => [ "[@metadata][timestamp]", "YYYY:MM:dd-HH:mm:ss" ] }
    kv { source => "[@metadata][restOfLine]" }

fuXz · October 22, 2019, 4:43pm

First thanks for your input.

Do you thing dissect is the best way to archieve the goal? Because the log line changes from time to time with different sub (apache, firewall, reverse proxy for publishing, captive portal, VPNs, ...) Nearly every sub has it own log line (with various parameters).

The documentation from elastic mention that dissect and grok can be used together for this case. Maybe you are familiar with that?

For the moment i have a mulitple match via grok running, the problem was a needed restart of logstash, the automatic reload was not enough.

Badger · October 22, 2019, 8:11pm

Yes. If all of your log lines have a common prefix (like "<30>2019:10:22-11:11:37 sophos-utm httpproxy[18718]: ") then you can use dissect to pick that apart, and then go after the rest of the message (what I called [@metadata][restOfLine]) with grok or other filters. So basically what I posted demonstrates exactly that.

fuXz · October 24, 2019, 11:21am

Thank you again for this, this works for me.

One last question:
How to you differ the sources/endpoints which are sending logs to logstash with one input configured (e.g. 514 syslog). The endpoints (cisco,hp,utm,pfsense, ....) sends their log data so what is a good practice to tag the logs and filter it afterwards? In different documentations there are examples for one single input with a single filter.

Badger · October 24, 2019, 2:55pm

syslog messages will include a hostname or IP address, so as a first level of tagging you could use a translate filter to map the hostname to a system type. However, syslog data is basically unstructured, so you will end up doing a lot of pattern matching even once you know the system type.

system · November 21, 2019, 2:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sophos UTM syslog message parsing with Logstash Logstash	2	1582	July 9, 2019
_grokparsefailure, _geoip_lookup_failure sophos utm Logstash	5	811	February 27, 2019
Grok pattern(s) working in debugger but not used in logstash? Logstash	15	1453	July 6, 2021
Parsing firewall logs in logstash Logstash	7	2517	January 20, 2020
[SOLVED] Grok or not Grok on SophosXG Logstash	5	532	September 23, 2018

Logstash Grok Filter failure with match

Related topics