manishr
(Manish R)
March 29, 2016, 8:28am
1
Hi Guys,
Help me getting cloudfront logs parsed in logstash. I want each filed to be searchable even parameters like aid, bid, cid etc. See below
Sample cloudfront log
2016-03-29 04:02:08 ABC1 461 22.20.17.8 GET afsaGdhfxghxgh.cloudfront.net /1.gif - Mozilla/5.0%2520(Linux;%2520Android%25205.1.1;%2520SM-G920I%2520Build/LMY47X;%2520wv)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Chrome/48.0.2564.106%2520Mobile%2520Safari/537.36 aid=fsdggg25346&bid=fsdgagsexfdhg&cid=1423690744601076&cb=fdfsdggg&did=fsagdsgg&eid=fDSGzsgdfhdsh - Miss jAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ== afsaGdhfxghxgh.cloudfront.net https 558 0.715 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss
Working one but I want parameters as well to be searchable
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
Not working
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t(?[A-Za-z0-9$.+!'|(){},~@#%&/=:;_?-[]<>^`] )?)?)\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
Logstash Configuration
input {
file {
path => "/opt/cloudfront/E2I53NO2J8KEJZ*"
type => "cloudfront"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}
filter {
if [type] == "cloudfront" {
if ( ("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}
grok {
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t(<params>\?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>\^\`]*)?)?)\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
}
}
mutate {
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "listener_timestamp", "%{date} %{time}" ]
}
date {
match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
}
if [params] {
mutate {
rename => { "params" => "params[request]" }
}
urldecode {
field => "params[request]"
}
kv {
source => "params[request]"
field_split => "?&"
target => "params"
}
ruby {
code => "
arguments = Array.new
event['params'].to_hash.each {|k,v|
if k == 'request' then
next
end
arguments << { 'key' => k, 'value' => v }
}
unless arguments.empty?
event['[arguments]'] = arguments
end
"
remove_field => [ "params" ]
}
}
}
output {
stdout { codec => rubydebug }
}
So it's the cs_uri_stem
field that contains the data you want to parse further? Keep the origin expression that works and use a kv filter to parse cs_uri_stem
.
And try to avoid having multiple GREEDYDATA patterns in the same expression. It might seem to work but it can easily blow up later. If the fields are tab-delimited why not use the csv filter to extract the fields instead of grok?
manishr
(Manish R)
March 29, 2016, 10:07am
3
Yes cs_uri_stem contains the data that I want to parse but if you look my above configuration you will find that there are two cs_uri_stem. so I changed one cs_uri_stem to cs_uri and used kv as suggested by you as below but after the logs got loaded to elasticsearch I am not able search the parameters for example aid="xxxxxxxx" AND bid="yyyyyyyyyyy"
input {
file {
path => "/opt/cloudfront/E2I53NO2J8KEJZ*"
type => "cloudfront"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}
filter {
if [type] == "cloudfront" {
if ( ("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}
grok {
match => { "message" => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri}\t%{NUMBER:sc_status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes}\t%{GREEDYDATA:time_taken}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}" }
}
}
mutate {
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "listener_timestamp", "%{date} %{time}" ]
}
date {
match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
}
if [cs_uri_stem] {
mutate {
rename => { "cs_uri_stem" => "cs_uri_stem[request]" }
}
urldecode {
field => "cs_uri_stem[request]"
}
kv {
source => "cs_uri_stem[request]"
field_split => "?&"
target => "cs_uri_stem"
}
ruby {
code => "
arguments = Array.new
event['cs_uri_stem'].to_hash.each {|k,v|
if k == 'request' then
next
end
arguments << { 'key' => k, 'value' => v }
}
unless arguments.empty?
event['[arguments]'] = arguments
end
"
remove_field => [ "cs_uri_stem" ]
}
}
}
output {
stdout { codec => rubydebug }
}
Do you think csv would be better than grok in this use case?
but after the logs got loaded to elasticsearch I am not able search the parameters for example aid="xxxxxxxx" AND bid="yyyyyyyyyyy"
What do the resulting events look like? Please show the output of the stdout { codec => rubydebug }
output.
manishr
(Manish R)
March 29, 2016, 11:14am
5
{
"message" => "2016-03-29\t04:02:08\tABC1\t461\t22.20.17.8\tGET\tafsaGdhfxghxgh.cloudfront.net \t/1.gif\t-\tMozilla/5.0%2520(Linux;%2520Android%25205.1.1;%2520SM-G920I%2520Build/LMY47X;%2520wv)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Chrome/48.0.2564.106%2520Mobile%2520Safari/537.36\taid=fsdggg25346&bid=fsdgagsexfdhg&cid=1423690744601076&cb=fdfsdggg&did=fsagdsgg&eid=fDSGzsgdfhdsh\t\t Miss\tjAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ==\tafsaGdhfxghxgh.cloudfront.net \thttps\t558\t0.715\t\t TLSv1.2\tECDHE-RSA-AES128-GCM-SHA256\tMiss",
"@version " => "1",
"@timestamp " => "2016-03-29T01:04:12.000Z",
"path" => "/opt/cloudfront/E2I53NO2J8KEJZ",
"host" => "localhost",
"type" => "cloudfront",
"date" => "16-03-27",
"time" => "01:04:12",
"x_edge_location" => "ABC1",
"sc_bytes" => "461",
"c_ip" => "22.20.17.8",
"cs_method" => "GET",
"cs_host" => "afsaGdhfxghxgh.cloudfront.net ",
"cs_uri" => "/1.gif",
"sc_status" => "200",
"referrer" => "-",
"User_Agent" => "Mozilla/5.0%2520(Linux;%2520U;%2520Android%25204.2.2;%2520en-gb;%2520SM-T110%2520Build/JDQ39)%2520AppleWebKit/534.30%2520(KHTML,%2520like%2520Gecko)%2520Version/4.0%2520Safari/534.30",
"cookies" => "-",
"x_edge_result_type" => "Miss",
"x_edge_request_id" => "jAk9duSOoOPVfssDGZdfhgxxxghfghpzK35tRuujwuQ==",
"x_host_header" => "afsaGdhfxghxgh.cloudfront.net ",
"cs_protocol" => "https",
"cs_bytes" => "581",
"time_taken" => "0.042",
"x_forwarded_for" => "-",
"ssl_protocol" => "TLSv1",
"ssl_cipher" => "ECDHE-RSA-AES128-SHA",
"x_edge_response_result_type" => "Miss",
"received_at" => "2016-03-29T10:16:46.815Z",
"listener_timestamp" => "16-03-27 01:04:12",
"arguments" => [
[0] {
"key" => "aid",
"value" => "432432546376879869"
},
[1] {
"key" => "bid",
"value" => "reawca54rsyxdfhgtf"
},
[2] {
"key" => "cid",
"value" => "gzsdfhbxdfhx35q"
},
[3] {
"key" => "did",
"value" => "35434w65474ew"
},
[4] {
"key" => "eid",
"value" => "43r536w456"
}
]
}
After loading this log, I see "arguments" field as not indexed and hence not searchable.
You probably don't want arguments
to be an array of objects. Searches are not going to work like you expect them to. Instead, I suggest you aim for
"arguments": {
"aid": "432432546376879869",
"bid": "reawca54rsyxdfhgtf",
...
}
which is what the kv filter should give you out of the box.
manishr
(Manish R)
April 4, 2016, 1:53pm
7
I used just kv filter and it is working as expected but it looks like while searching in kibana the count of log and ES data is different. Is it because I am using kv filter? Do we any alternative of kv filter to achieve what you told in your last response.
You have to be more specific than "while searching in kibana the count of log and ES data is different".
I don't think the kv filter has anything to do with this.