Grok pattern

Hi i am new to logstash can i get help on creating a grok pattern for following log line

[20/Apr/2023:11:25:44.389 +0530] 200 | 38 ms | 2 B | 172.31.40.179 |        172.31.40.179 | 8DCE7CA611DB96B7B6767151C08E79B6 | - | "GET /3dpassport/resources-210831082202/js/DS/W3DPassport/W3DPassport.js HTTP/1.1" | -

I started with
[%{HTTPDATE:timestamp}] | %{NUMBER:response} | %{NUMBER:duration}

But the duration field is taking only 38 and not ms. how can i take 38 ms in duration? and same applied for the next field.

\[%{HTTPDATE:timestamp}\]%{SPACE}%{POSINT:responsetime}%{SPACE}\|%{SPACE}%{INT:responsetime}%{SPACE}%{WORD:timeunit}%{SPACE}\|%{SPACE}%{INT:response}%{SPACE}%{WORD:sizeunit}%{SPACE}\|%{SPACE}%{IP:srcip}%{SPACE}\|%{SPACE}%{IP:destip}%{SPACE}\|%{SPACE}%{DATA:hash}%{SPACE}\|%{SPACE}%{DATA:something}%{SPACE}\|%{SPACE}%{DATA:methodurl}%{SPACE}\|%{SPACE}%{GREEDYDATA:something2}

Note:

  • 38 ms and 2 B are split on numeric value and unit
  • something and something2 fields should be renamed
  • srcip, destip should be checked, are they OK or switch position
  • methodurl can be parsed on method and url
  • timestamp should be converted to date

Result:

Ok, i was able to make this grok pattern

\[%{HTTPDATE:timestamp}\] \| %{NUMBER:response} \| (?<duration>%{NUMBER} %{WORD}) \| (?<bytes>%{NUMBER} %{WORD}) \| %{IP:hostip} \| %{SPACE} %{IP:clientip} \| %{WORD:token} \| %{DATA:unknown} \| "(?<method>%{WORD}) (?<url>%{URIPATHPARAM}) (?:HTTP/%{NUMBER:http_version})" \| %{EMAILADDRESS:loginemail}

output

{
  "method": "GET",
  "hostip": "172.31.40.179",
  "http_version": "1.1",
  "url": "/3dpassport/cas/getcastgc?userid=admin_platform&service=V6&ticket=ST-4-KMILf6L6d6KgipUJoAqS-cas&callback=https%3A%2F%2F3ds%2eaws%2eminutuscloud%2ecom%2F3dspace%2Fwebapps%2Fi3DXCompass%2Fgettgc%2ehtml",
  "token": "8DCE7CA611DB96B7B6767151C08E79B6",
  "unknown": "-",
  "duration": "428 ms",
  "loginemail": "admin_platform@myemail.com",
  "response": "200",
  "bytes": "630 B",
  "clientip": "172.31.40.179",
  "timestamp": "20/Apr/2023:11:26:36.578 +0530"
}

Here is my logstash conf file

input {
    file {
#            type => "tomeeaccess_passport"
            path => "/etc/logstash/logs/localhost_access_log..2023-04-20.txt"
            start_position => beginning
            #sincedb_path => "/opt/logstash/sincedb-access"
    }
}
filter {
    grok {
        match => { "message" => "\[%{HTTPDATE:timestamp}\] \| %{NUMBER:response} \| (?<duration>%{NUMBER} %{WORD}) \| (?<bytes>%{NUMBER} %{WORD}) \| %{IP:hostip} \| %{SPACE} %{IP:clientip} \| %{WORD:token} \| %{DATA:unknow} \| \"(?<method>%{WORD}) (?<url>%{URIPATHPARAM}) (?:HTTP/%{NUMBER:http_version})\" \| %{EMAILADDRESS:loginemail}" }
    }
}

output {
    elasticsearch {
            hosts => "http://ip:9200"
            user => "elastic"
            password => "pwd"
            index => "access_passport%{+YYYY.MM.dd}"
    }
}

I am getting a grok parse failure. Can you telll me where i am going wrong?

Hi Ok i was able to figure out the grok parse failure thing. In the log sample that i shared, There a couple of optional fields. For example

[20/Apr/2023:11:27:43.094 +0530] | 200 | 82 ms | 4196 B | 172.31.40.179 |        172.31.40.179 | 8DCE7CA611DB96B7B6767151C08E79B6 | - | "GET /3dpassport/api/authenticated/user/fields HTTP/1.1" | admin_platform@myemail.com

and

[20/Apr/2023:11:27:43.094 +0530] | 200 | 82 ms | **- B** | 172.31.40.179 |        172.31.40.179 | - | **-** | "GET /3dpassport/api/authenticated/user/fields HTTP/1.1" | admin_platform@myemail.com

As shown in the bold text.
So all the entries where there is - instead f=of the string or number i am getting grokparsefailure. What can i do for this? There are two possible w=values either a - or the grok pattern i have added. What can i do?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.