Grok patern for Cloudfront logs not work

Hello all, can anybody help my ?
I am use this grok pattern for cloudfont logs
"%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}%{SPACE}%{TIME:time}%{SPACE}(?<x_edge_location>\b\w+\b)%{SPACE}(?:%{NUMBER:sc_bytes}|-)%{SPACE}%{IPORHOST:clientip}%{SPACE}%{WORD:cs_method}%{SPACE}%{HOSTNAME:cs_host}%{SPACE}%{NOTSPACE:cs_uri_stem}%{SPACE}%{NUMBER:sc_status}%{SPACE}%{URI:referrer}%{SPACE}%{QS:agent}%{SPACE}%{GREEDYDATA:cs_uri_query}%{SPACE}%{GREEDYDATA:cookies}%{SPACE}%{WORD:x_edge_result_type}%{SPACE}%{NOTSPACE:x_edge_request_id}%{SPACE}%{HOSTNAME:x_host_header}%{SPACE}%{GREEDYDATA:cs_protocol}%{SPACE}%{INT:cs_bytes}%{SPACE}%{GREEDYDATA:time_taken}%{SPACE}%{GREEDYDATA:x_forwarded_for}%{SPACE}%{GREEDYDATA:ssl_protocol}%{SPACE}%{GREEDYDATA:ssl_cipher}%{SPACE}%{GREEDYDATA:x_edge_response_result_type}%{SPACE}%{GREEDYDATA:cs_protocol_version}"
Log is:
Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-byte
s time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version
2017-01-01 00:38:23 DEE50 30609 66.81.46.575 GET gdfgd242.cloudfront.net /ee/jquery.min.js 200 https://test.org/ Mozilla/5.0 (Linux; Android 5.1; XT1526 Build/LPI23.29-18-S.11) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.93 Mobile Safari/537.36 - - Hit NjK22JT44gv2HLyIEuXObZJfMuny9n_sheCpSSTKpo2mw0ZRVjr7rQ== test.org https 371 0.023 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/1.1
2017-01-01 00:38:23 DEE50 2191 65.17.95.57 GET gdfgd242.cloudfront.net /ee/md5.min.js 200 https://test.org/ Mozilla/5.0 (Linux; Android 5.1; XT1526 Build/LPI23.29-18-S.11) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.93 Mobile Safari/537.36 - - Hit mjSw9dEErWv3EdZULzjuREIYBm8yFg59kUU2WWBS2hg-e4o9k2LOtw== test.org https 368 0.018 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/1.1

In the log messages you show the useragent is not a quoted string. If you fix that then your next problem is that the GREEDYDATA for cs_uri_query will consume the whole of the rest of the line.

If the agent entries in the log file are quoted I think you would do better using dissect.

dissect { mapping => { "message" => '%{year}-%{month}-%{day} %{time} %{x_edge_location} %{sc_bytes} %{clientip} %{cs_method} %{cs_host} %{cs_uri_stem} %{sc_status} %{referrer} "%{agent}" %{cs_uri_query} %{cookies} %{x_edge_result_type} %{x_edge_request_id} %{x_host_header} %{cs_protocol} %{cs_bytes} %{time_taken} %{x_forwarded_for} %{ssl_protocol} %{ssl_cipher} %{x_edge_response_result_type} %{cs_protocol_version}' } }

Although personally I would replace '%{year}-%{month}-%{day} %{time}' with '{ts} {+ts}'.

If they are not quoted it is going to be harder. You will need to grok the front and back of the log entry separately (anchored) and then gsub them off to leave the rest of the line as the agent string.

I am little change on logs:

2017-01-01 00:38:23 DEE50 30609 66.81.46.575 GET gdfgd242.cloudfront.net /ee/md5.min.js 200 https://test.org/ Mozilla/5.0%2520(Linux;%2520Android%25205.1;%2520XT1526%2520Build/LPI23.29-18-S.11)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/39.0.2171.93%2520Mobile%2520Safari/537.36 - - Hit NjK22JT44gv2HLyIEuXObZJfMuny9n_sheCpSSTKpo2mw0ZRVjr7rQ== test.org https 371 0.023 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/1.1

But no not work for me too :frowning:

That is a horrible, horrible log format. But logstash can do it.

    grok { match => { "message" => [ "%{NOTSPACE:e1} %{NOTSPACE:e2} %{NOTSPACE:e3} %{NOTSPACE:e4} %{NOTSPACE:domain} %{NOTSPACE:protocol} %{NOTSPACE:e7:int} %{NOTSPACE:e8} %{NOTSPACE:e9} %{NOTSPACE:e10} %{NOTSPACE:cipher} %{NOTSPACE:e12} %{NOTSPACE:e13}$" ] } }
    mutate { gsub => [ "message", " [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+$", "" ] }
    grok { match => { "message" => [ "^%{NOTSPACE:date} %{NOTSPACE:time} %{NOTSPACE:f3} %{NOTSPACE:f4} %{NOTSPACE:ip} %{NOTSPACE:method} %{NOTSPACE:server} %{NOTSPACE:uri} %{NOTSPACE:status} %{NOTSPACE:referer}" ] } }
    mutate { gsub => [ "message", "^[^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ ", "" ] }
    mutate { rename => { "message" => "user-agent" } }
1 Like
input {
  file {
    path => "/var/log/aws/cloudfront.log"
    start_position => "beginning"
    type => "logs"
  }
}

filter {
    grok { match => { "message" => [ "%{NOTSPACE:e1} %{NOTSPACE:e2} %{NOTSPACE:e3} %{NOTSPACE:e4} %{NOTSPACE:domain} %{NOTSPACE:protocol} %{NOTSPACE:e7:int} %{NOTSPACE:e8} %{NOTSPACE:e9} %{NOTSPACE:e10} %{NOTSPACE:cipher} %{NOTSPACE:e12} %{NOTSPACE:e13}$" ] } }
    mutate { gsub => [ "message", " [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+$", "" ] }
    grok { match => { "message" => [ "^%{NOTSPACE:date} %{NOTSPACE:time} %{NOTSPACE:f3} %{NOTSPACE:f4} %{NOTSPACE:ip} %{NOTSPACE:method} %{NOTSPACE:server} %{NOTSPACE:uri} %{NOTSPACE:status} %{NOTSPACE:referer}" ] } }
    mutate { gsub => [ "message", "^[^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ ", "" ] }
    mutate { rename => { "message" => "user-agent" } }
    geoip {
      source => "ip"
    }
}
 
output {
  elasticsearch { 
    hosts => ["localhost:9200"] 
  index => "cloudfront" 
 }
}

work, but parsing is not good :frowning:

anybody have ideas ? please help me :slight_smile:

Please paste the entire event from the JSON tab in Kibana. As text, not an image. It's OK to sanitize fields.

{
"_index": "cloudfront",
"_type": "doc",
"_id": "mDGWfsQBpK-ZGgBtR3r",
"_version": 1,
"_score": null,
"_source": {
"user-agent": "2017-01-06\t22:33:35\tDEE50\t368599\t54.110.162.171\tGET\t654w0e1oasspzg1.cloudfront.net\t/js/m_s_main.swf\t200\thttps://test.com/cindex.swf\tMozilla/5.0%2520(Windows%2520NT%252010.0;%2520Win64;%2520x64)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/51.0.2704.79%2520Safari/537.36%2520Edge/14.14393\t-\t-\tHit\te7lPYUlWTe0YnBpUhrQo9iiziN_MX2EykoBBhhdlc3tmqEvkCwCxrA==\ttest.com\thttps\t384\t0.020\t-\tTLSv1.2\tECDHE-RSA-AES128-GCM-SHA256\tHit\tHTTP/1.1",
"@version": "1",
"type": "logs",
"host": "elk.test.net",
"tags": [
"_grokparsefailure",
"_geoip_lookup_failure"
],
"path": "/var/log/aws/aws.log",
"@timestamp": "2018-07-19T12:49:38.623Z"
},
"fields": {
"@timestamp": [
"2018-07-19T12:49:38.623Z"
]
},
"sort": [
1532004578623
]
}

OK, so your fields are tab separated, not space separated. So in all those grok and mutate patterns, where I had a single space separating fields you need to replace that with a tab.

Change this [^[:space:]]+ to [%[:space:]]+ ? i am not find grok tab pattern...

No I am saying everywhere there is something like

%{NOTSPACE:e1} %{NOTSPACE:e2}

or

 [^[:space:]]+ [^[:space:]]+

change the space separating the regexps to be a tab, so that you have

%{NOTSPACE:e1}        %{NOTSPACE:e2}

or

    [^[:space:]]+       [^[:space:]]+
1 Like

thank you, i am test this and wrote about results.

thank you for help, i am change tabs to space in log file and all work good!

input {
file {
path => "/var/log/aws/new.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
grok { match => { "message" => [ "%{NOTSPACE:cs(Cookie)} %{NOTSPACE:x-edge-result-type} %{NOTSPACE:cipher x-edge-response-result-type} %{NOTSPACE:x-edge-request-id} %{NOTSPACE:domain} %{NOTSPACE:protocol} %{NOTSPACE:cs-bytes:int} %{NOTSPACE:time-taken} %{NOTSPACE:x-forwarded-for} %{NOTSPACE:ssl-protocol} %{NOTSPACE:cipher} %{NOTSPACE:x-edge-response-result-type} %{NOTSPACE:cs-protocol-version}$" ] } }
mutate { gsub => [ "message", " [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+$", "" ] }
grok { match => { "message" => [ "^%{NOTSPACE:date} %{NOTSPACE:time} %{NOTSPACE:x-edge-location} %{NOTSPACE:sc-bytes} %{NOTSPACE:ip} %{NOTSPACE:method} %{NOTSPACE:server} %{NOTSPACE:uri} %{NOTSPACE:status} %{NOTSPACE:referer}" ] } }
mutate { gsub => [ "message", "^[^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ [^[:space:]]+ ", "" ] }
mutate { rename => { "message" => "user-agent" } }
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "ip"
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "cloudfront"
}
}

This work for me

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.