Hello All,
Relatively new to the ELK world and I am having trouble making the information in the CloudFront log searchable in Kibana.
Let me explain in detail:
- I was able to have CF (CloudFront) logs delivered to logstash instance and view them on Kibana.
- Problem is: the "log data" arrives as a giant "message" blob (see below)
{"x_edge_result_type":"Miss","x_forwarded_for":"-","sc_content_len":"42109","@version":"1","time_to_first_byte":"1.039","x_edge_detailed_result_type":"Miss","cookies":"-","sc_range_end":"-","useragent":{"os":"Debian","name":"Other","build":"","os_name":"Debian","device":"Other"},"sc_bytes":43130,"@timestamp":"2020-09-12T23:05:11.000Z","c_ip":"111.222.333.444","cs_protocol_version":"HTTP/1.1","monthday":"12","type":"access-cf-repos","cs_host":"d111111111.cloudfront.net","fle_status":"-","geoip":{"region_code":"OR","postal_code":"97818","latitude":45,"continent_code":"NA","longitude":-119,"location":.{"lat":45,"lon":-119},"country_code3":"US","dma_code":810,"ip":"111.222.333.444","city_name":"Boardman","country_code2":"US","region_name":"Oregon","timezone":"America/Los_Angeles","country_name":"United States"},"referrer":"-","month":"09","x_edge_request_id":"CgmMEnPPv0NNcIEijDOWVBcLJmvp","cs_protocol":"https","fle_encrypted_fields":"-","time_taken":1.04,"c_port":26985,"x_edge_response_result_type":"Miss","cs_uri_query":"-","year":"2020","sc_content_type":"application/octet-stream","cs_bytes":1265,"x_host_header":"xxx.xxx.net","ssl_protocol":"TLSv1.2","sc_range_start":"-","cs_method":"GET","sc_status":200,"x_edge_location":"HIO51-C1","ssl_cipher":"ECDHE-SHA256","cs_uri_stem":"/dists/xenial/InRelease"}
This is how it looks like in Kibana:
Which is not ideal since I won't be able to specify the "fields" that meet my criteria.
For example, If I want to search for messages matches the following criteria:
"type":"access-cf-repos"
"c_ip":"111.222.333.444"
"sc_status":200
I will have to use the search bar with lucene query syntax (which is ugly) because those criteria are not listed as "Available Fields".
If I understand it correctly, this should be related to how the logstash config file is structured.
Here is a snippet of my logstash config file:
input {
s3 {
"bucket" => "dist-log"
"prefix" => "distribution-log/"
"type" => "access-cf-repos"
"region" => "us-west-2"
"interval" => "60"
"delete" => "true"
}
}
filter {
grok {
match => { "message" => "(?<date>%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:monthday})\t%{TIME:time}\t(?<x_edge_location>[\w\-]+)\t(?:%{NUMBER:sc_bytes:int}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status:int}\t%{NOTSPACE:referrer}\t%{NOTSPACE:User_Agent}\t%{NOTSPACE:cs_uri_query}\t%{NOTSPACE:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{WORD:cs_protocol}\t%{NUMBER:cs_bytes:int}\t%{NUMBER:time_taken:float}\t%{NOTSPACE:x_forwarded_for}\t%{NOTSPACE:ssl_protocol}\t%{NOTSPACE:ssl_cipher}\t%{WORD:x_edge_response_result_type}\t%{NOTSPACE:cs_protocol_version}\t%{NOTSPACE:fle_status}\t%{NOTSPACE:fle_encrypted_fields}\t%{NUMBER:c_port:int}\t%{NOTSPACE:time_to_first_byte}\t%{WORD:x_edge_detailed_result_type}\t%{NOTSPACE:sc_content_type}\t%{NOTSPACE:sc_content_len}\t%{NOTSPACE:sc_range_start}\t%{NOTSPACE:sc_range_end}" }
}
mutate {
add_field => [ "listener_timestamp", "%{date} %{time}" ]
}
date {
match => [ "listener_timestamp", "yyyy-MM-dd HH:mm:ss" ]
target => "@timestamp"
}
geoip {
source => "c_ip"
}
useragent {
source => "User_Agent"
target => "useragent"
}
mutate {
remove_field => ["date", "time", "listener_timestamp", "cloudfront_version", "message", "cloudfront_fields", "User_Agent"]
}
}
Is there a better way to breakdown all the fields inside that giant message blob?
Perhaps there are plugins that I can use or some examples that I can follow?
Thank you in advance!