Logstash - Grok Syntax Issues

I'm using filebeat to send log to logstash but I'm having issues with grok syntax on Logstash. I used the grok debugger on Kibanna and manager to come to a solution.
The problem is that I can't find the same syntax for Logstash.

The original log :

{"log":"188.188.188.188 - tgaro [22/Aug/2022:11:37:54 +0200] \"PROPFIND /remote.php/dav/files/xxx@yyyy.com/ HTTP/1.1\" 207 1035 \"-\" \"Mozilla/5.0 (Windows) mirall/2.6.1stable-Win64 (build 20191105) (Nextcloud)\"\n","stream":"stdout","time":"2022-08-22T09:37:54.782377901Z"}

The message receive in Logstash :

"message" => "{\"log\":\"188.188.188.188 - tgaro [22/Aug/2022:11:37:54 +0200] \\\"PROPFIND /remote.php/dav/files/xxx@yyyy.com/ HTTP/1.1\\\" 207 1035 \\\"-\\\" \\\"Mozilla/5.0 (Windows) mirall/2.6.1stable-Win64 (build 20191105) (Nextcloud)\\\"\\n\",\"stream\":\"stdout\",\"time\":\"2022-08-22T09:37:54.782377901Z\"}",

The Grok Pattern i used on Grok Debugger (Kibana):

{\\"log\\":\\"%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \\\\\\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\\\\\\" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) \\\\\\("%{DATA:referrer}\\\\\\") \\\\\\"%{DATA:user-agent}\\\\\\"

The real problem is that I can't even manage to get the IP (188.188.188.188).
I tried :

match => { "message" => '{\\"log\\":\\"%{IPORHOST:clientip}' # backslash to escape the backslash
match => { "message" => '{\\\"log\\\":\\\"%{IPORHOST:clientip}' # backslash to escape the quote
match => { "message" => "{\\\"log\\\":\\\"%{IPORHOST:clientip}" # backslash to escape the quote

Help would be appreciated
Thanks !

ps : The log used here is shrink. The real log is mixed with Json and string so i can't send it as Json in Filebeat.
ps2 : First time posting on these forums. Not sure if I'm at the right place

The original log you shared is a json document with a plain text message in the field log, is this what you mean by mixed or did you remove anything from your log?

You will have a lot of trouble trying to find a grok to parse a json message, the best approach for you is to first use the json filter in your original message and then use grok in the log field created by the json filter.

The json filter will give you these fields:

{
  "log": "188.188.188.188 - tgaro [22/Aug/2022:11:37:54 +0200] \"PROPFIND /remote.php/dav/files/xxx@yyyy.com/ HTTP/1.1\" 207 1035 \"-\" \"Mozilla/5.0 (Windows) mirall/2.6.1stable-Win64 (build 20191105) (Nextcloud)\"\n",
  "stream": "stdout",
  "time": "2022-08-22T09:37:54.782377901Z"
}

From this you can build a grok for the log field.

What I mean by "The log used here is shrink" is that the original log is this shape :

Aug 24 00:00:01 hostname containers: {"log":"188.188.188.188 - user.name@things.com [23/Aug/2022:23:59:52 +0200] \"PROPFIND /remote.php/dav/files/ HTTP/1.1\" 207 1159 \"-\" \"Mozilla/5.0 (Linux) mirall/3.4.2-1ubuntu1 (Nextcloud, ubuntu-5.15.0-46-generic ClientArchitecture: x86_64 OsArchitecture: x86_64)\"\n","stream":"stdout","time":"2022-08-23T21:59:52.612843092Z"}

I have no issue parssing the start of the log. But when it come to {"log": IP .....} I can't find a way to parse it in logstash. In the grok debugger of Kibana it work well.

It is the same approach, you have a string part and a json part, you should split them in different fields and parse the json part using the json filter.

How are you parsing the first part of the log? Please share your full configuration to make it easy to understand.

For example, using dissect you could parse like this:

dissect {
    mapping => {
        "message" => "%{month} %{day} %{time} %{hostname} containers: %{jsonMessage}"
    }
}

This would give you something like this:

{
  "month": "Aug",
  "day": "24",
  "time": "00:00:01",
  "hostname": "hostname",
  "jsonMessage": {"log":"188.188.188.188 - user.name@things.com [23/Aug/2022:23:59:52 +0200] \"PROPFIND /remote.php/dav/files/ HTTP/1.1\" 207 1159 \"-\" \"Mozilla/5.0 (Linux) mirall/3.4.2-1ubuntu1 (Nextcloud, ubuntu-5.15.0-46-generic ClientArchitecture: x86_64 OsArchitecture: x86_64)\"\n","stream":"stdout","time":"2022-08-23T21:59:52.612843092Z"}
}

You would then use a json filter to parse the jsonMessage to extract the log field and make it easier to parse with grok or dissect.

Hum didn't know i could treat the first part of the log as string, and the json as json. I don't know how to do so. Do i need to use dissect, and than dissect will creat a field with my json data in it, and than run a grok on that field ?

The first part is treat like that :

match => { "message" => '%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{WORD:syslog_tag}:'

I try this and it didn't work. I'm assuming that the patterns only work in a grok expression

dissect {
                mapping => {
                        "message" => '%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{WORD:syslog_tag}: %{jsonMessage}' 
                }
        }

Yes, dissect does not use regex, it uses only the position in the message, it is best used if your log structure does not change, for example, first field will always be the month name, second will always be the day, third will always be the time and goes on.

Since it does not use regex, it uses a lot less CPU resources, but if your message changes frequently it may not be the best solution or you will need to use some conditionals and use multiple dissects.

But you can do the same thing with grok

It would be something like this, I think.

match => { "message" => '%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{WORD:syslog_tag}: %{GREEDYDATA:jsonMessage}'

Ok, so i manage to make it work by using this :

grok {
                match => { "message" => '%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{WORD:syslog_tag}: %{GREEDYDATA:jsonMessage}' }
        }
        json {
                source => "jsonMessage"
        }
        grok {
                # match => { "jsonMessage" => '%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\\" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) \\("%{DATA:referrer}\\") \\"%{DATA:user-agent}\\"'}
                match => { "jsonMessage" => '%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\\" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) \\("%{DATA:referrer}\\") \\"%{DATA:user-agent}\\"'}
                
        }

I still don't understand why my grok work on Kibanna debugger but not on Logstash. I mean it's probably because some character needs to be escaped. But even when i escape them it didn't work.

Thanks for the help ! It was helpful

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.