Logstash configuration for Cloudfront logs

manishr · March 29, 2016, 8:28am

Hi Guys,

Help me getting cloudfront logs parsed in logstash. I want each filed to be searchable even parameters like aid, bid, cid etc. See below

Sample cloudfront log

2016-03-29 04:02:08 ABC1 461 22.20.17.8 GET afsaGdhfxghxgh.cloudfront.net /1.gif - Mozilla/5.0%2520(Linux;%2520Android%25205.1.1;%2520SM-G920I%2520Build/LMY47X;%2520wv)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Chrome/48.0.2564.106%2520Mobile%2520Safari/537.36 aid=fsdggg25346&bid=fsdgagsexfdhg&cid=1423690744601076&cb=fdfsdggg&did=fsagdsgg&eid=fDSGzsgdfhdsh - Miss jAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ== afsaGdhfxghxgh.cloudfront.net https 558 0.715 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss

Working one but I want parameters as well to be searchable

match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }

Not working

match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t(?[A-Za-z0-9$.+!'|(){},~@#%&/=:;_?-[]<>^`])?)?)\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }

Logstash Configuration

input {
file {
path => "/opt/cloudfront/E2I53NO2J8KEJZ*"
type => "cloudfront"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}

filter {
if [type] == "cloudfront" {
if ( ("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}

            grok {

                                            match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t(<params>\?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>\^\`]*)?)?)\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }


            }
    }
            mutate {
                    add_field => [ "received_at", "%{@timestamp}" ]
                    add_field => [ "listener_timestamp", "%{date} %{time}" ]
            }

            date {
                    match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
            }
                    if [params] {
            mutate {
                    rename => { "params" => "params[request]" }
            }
            urldecode {
                    field => "params[request]"
            }
            kv {
                    source => "params[request]"
                    field_split => "?&"
                    target => "params"
            }
    ruby {
            code => "
            arguments = Array.new
            event['params'].to_hash.each {|k,v|
    if k == 'request' then
    next
  end
  arguments << { 'key' => k, 'value' => v }
}
unless arguments.empty?
  event['[arguments]'] = arguments
end

"
remove_field => [ "params" ]
}

}

output {
stdout { codec => rubydebug }

}

magnusbaeck · March 29, 2016, 8:46am

So it's the cs_uri_stem field that contains the data you want to parse further? Keep the origin expression that works and use a kv filter to parse cs_uri_stem.

And try to avoid having multiple GREEDYDATA patterns in the same expression. It might seem to work but it can easily blow up later. If the fields are tab-delimited why not use the csv filter to extract the fields instead of grok?

manishr · March 29, 2016, 10:07am

Yes cs_uri_stem contains the data that I want to parse but if you look my above configuration you will find that there are two cs_uri_stem. so I changed one cs_uri_stem to cs_uri and used kv as suggested by you as below but after the logs got loaded to elasticsearch I am not able search the parameters for example aid="xxxxxxxx" AND bid="yyyyyyyyyyy"

input {
file {
path => "/opt/cloudfront/E2I53NO2J8KEJZ*"
type => "cloudfront"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}

filter {
if [type] == "cloudfront" {
if ( ("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}

            grok {
                    match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
            }
    }
            mutate {
                    add_field => [ "received_at", "%{@timestamp}" ]
                    add_field => [ "listener_timestamp", "%{date} %{time}" ]
            }

            date {
                    match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
            }
            if [cs_uri_stem] {
                    mutate {
                            rename => { "cs_uri_stem" => "cs_uri_stem[request]" }
                    }
                    urldecode {
                            field => "cs_uri_stem[request]"
                    }
                    kv {
                            source => "cs_uri_stem[request]"
                            field_split => "?&"
                            target => "cs_uri_stem"
                    }
            ruby {
                    code => "
                    arguments = Array.new
                    event['cs_uri_stem'].to_hash.each {|k,v|
                    if k == 'request' then
                            next
                    end
                    arguments << { 'key' => k, 'value' => v }
                    }
                    unless arguments.empty?
                    event['[arguments]'] = arguments
            end
            "
            remove_field => [ "cs_uri_stem" ]
            }
    }

}

output {
stdout { codec => rubydebug }
}

Do you think csv would be better than grok in this use case?

magnusbaeck · March 29, 2016, 11:00am

but after the logs got loaded to elasticsearch I am not able search the parameters for example aid="xxxxxxxx" AND bid="yyyyyyyyyyy"

What do the resulting events look like? Please show the output of the stdout { codec => rubydebug } output.

manishr · March 29, 2016, 11:14am

{
"message" => "2016-03-29\t04:02:08\tABC1\t461\t22.20.17.8\tGET\tafsaGdhfxghxgh.cloudfront.net\t/1.gif\t-\tMozilla/5.0%2520(Linux;%2520Android%25205.1.1;%2520SM-G920I%2520Build/LMY47X;%2520wv)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Chrome/48.0.2564.106%2520Mobile%2520Safari/537.36\taid=fsdggg25346&bid=fsdgagsexfdhg&cid=1423690744601076&cb=fdfsdggg&did=fsagdsgg&eid=fDSGzsgdfhdsh\t\t Miss\tjAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ==\tafsaGdhfxghxgh.cloudfront.net\thttps\t558\t0.715\t\t TLSv1.2\tECDHE-RSA-AES128-GCM-SHA256\tMiss",
"@version" => "1",
"@timestamp" => "2016-03-29T01:04:12.000Z",
"path" => "/opt/cloudfront/E2I53NO2J8KEJZ",
"host" => "localhost",
"type" => "cloudfront",
"date" => "16-03-27",
"time" => "01:04:12",
"x_edge_location" => "ABC1",
"sc_bytes" => "461",
"c_ip" => "22.20.17.8",
"cs_method" => "GET",
"cs_host" => "afsaGdhfxghxgh.cloudfront.net",
"cs_uri" => "/1.gif",
"sc_status" => "200",
"referrer" => "-",
"User_Agent" => "Mozilla/5.0%2520(Linux;%2520U;%2520Android%25204.2.2;%2520en-gb;%2520SM-T110%2520Build/JDQ39)%2520AppleWebKit/534.30%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Safari/534.30",
"cookies" => "-",
"x_edge_result_type" => "Miss",
"x_edge_request_id" => "jAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ==",
"x_host_header" => "afsaGdhfxghxgh.cloudfront.net",
"cs_protocol" => "https",
"cs_bytes" => "581",
"time_taken" => "0.042",
"x_forwarded_for" => "-",
"ssl_protocol" => "TLSv1",
"ssl_cipher" => "ECDHE-RSA-AES128-SHA",
"x_edge_response_result_type" => "Miss",
"received_at" => "2016-03-29T10:16:46.815Z",
"listener_timestamp" => "16-03-27 01:04:12",
"arguments" => [
[0] {
"key" => "aid",
"value" => "432432546376879869"
},
[1] {
"key" => "bid",
"value" => "reawca54rsyxdfhgtf"
},
[2] {
"key" => "cid",
"value" => "gzsdfhbxdfhx35q"
},
[3] {
"key" => "did",
"value" => "35434w65474ew"
},
[4] {
"key" => "eid",
"value" => "43r536w456"
}
]
}

After loading this log, I see "arguments" field as not indexed and hence not searchable.

magnusbaeck · March 29, 2016, 11:28am

You probably don't want arguments to be an array of objects. Searches are not going to work like you expect them to. Instead, I suggest you aim for

"arguments": {
  "aid": "432432546376879869",
  "bid": "reawca54rsyxdfhgtf",
  ...
}

which is what the kv filter should give you out of the box.

manishr · April 4, 2016, 1:53pm

I used just kv filter and it is working as expected but it looks like while searching in kibana the count of log and ES data is different. Is it because I am using kv filter? Do we any alternative of kv filter to achieve what you told in your last response.

magnusbaeck · April 4, 2016, 5:05pm

You have to be more specific than "while searching in kibana the count of log and ES data is different".

I don't think the kv filter has anything to do with this.

Topic		Replies	Views
CloudFront log arrived as giant blob, how to make it searchable in Kibana? Kibana	2	376	October 21, 2020
Logstash with cloudfront Logstash	5	727	October 3, 2019
Logstash pipeline for aws cloudfront fixing timestamp issue Logstash	1	217	June 23, 2023
Tag _grokparsefailure when parsing logs via logstash Logstash	4	388	August 13, 2020
Grok patern for Cloudfront logs not work Logstash	14	1560	August 29, 2018

Logstash configuration for Cloudfront logs

Sample cloudfront log

Working one but I want parameters as well to be searchable

Not working

Logstash Configuration

Related topics