Parse Amazon S3 access log with multiple files

Hello All,

I am trying to parse Amazon S3 access log that has multiple chunks of the file. When i say multiple chunks i mean there are thousand of separate files with each files containing data in it. The data in these files are not JSON. I am having a tough time determining how to parse these files with logstash. Below is my logstash config i am using but apparently i do not see any data been logged with elasticsearch.

input {
file {
type => "s3-access-log"
path => "/opt/s3logs/logs/*"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
if [type] == "s3-access-log" {
grok {
match => { "message" => "%{S3_ACCESS_LOG}" }
}
date {
locale => "en"
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
output {
elasticsearch {
host => ["elk-prod-02-data01"]
index => "niraj-hello-s3-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}

Any help would be great.

Regards
Niraj

Are you getting anything in stdout?

{
"message" => "789235740a39f4da1f30c076c2dc42e7fdaae8302b896ea3386faa6eeb544c2f niraj-hello-na [23/Jun/2016:13:42:04 +0000] xx.xxx.43.30 arn:aws:iam::xxxxxxxx:user/niraj 0BFACD82FD8581AF REST.GET.LOGGING_STATUS - "GET /?logging HTTP/1.1" 200 - 235 - 6 - "-" "Cloudlytics, aws-sdk-java/1.9.16 Linux/3.13.0-36-generic OpenJDK_64-Bit_Server_VM/24.65-b04/1.7.0_65" -",
"@version" => "1",
"@timestamp" => "2016-06-23T13:42:04.000Z",
"type" => "s3-access-log",
"host" => "elk-prod-02-app01",
"path" => "/opt/s3logs/logs/2016-06-23-14-32-08-63692AC56311762B",
"owner" => "789235740a39f4da1f30c076c2dc42e7fdaae8302b896ea3386faa6eeb544c2f",
"bucket" => "niraj-hello-na",
"timestamp" => "23/Jun/2016:13:42:04 +0000",
"clientip" => "xx.xxx.43.30",
"requester" => "arn:aws:iam::xxxxxxxxxxx:user/jeffm",
"request_id" => "0BFACD82FD8581AF",
"operation" => "REST.GET.LOGGING_STATUS",
"key" => "-",
"verb" => "GET",
"request" => "/?logging",
"httpversion" => "1.1",
"response" => 200,
"bytes" => 235,
"request_time_ms" => 6,
"referrer" => ""-"",
"agent" => ""Cloudlytics, aws-sdk-java/1.9.16 Linux/3.13.0-36-generic OpenJDK_64-Bit_Server_VM/24.65-b04/1.7.0_65""
}
Jul 18, 2016 2:27:38 AM org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$4 handleException
WARNING: [logstash-elk-prod-02-app01-7804-9784] failed to send ping to [[#zen_unicast_1#][elk-prod-02-app01][inet[elk-prod-02-data01/xxx.16.0.166:9200]]]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[elk-prod-02-data01/xxx.16.0.166:9200]][internal:discovery/zen/unicast_gte_1_4] request_id [174] timed out after [3751ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This is what i get now when i added a port to the logstash config. But what i dont understand with the above is that even if it is displaying that it is adding something to the elasticsearch but i do not see any index in elasticsearch been created.

ES Version:- 1.5
LS Version:- 1.5

Why are you on such an old LS version?
You should switch to http protocol?

Mark,

The reason i have a old protocol is our current ES is on ES 1.5 and i did not wanted to upgrade it as of now so i started using a compatible version of ES.

When you say switch to http protocol, does this mean i should specify in protocol => "http" in output section of Logstash config.

LS and ES are not locked like that, you can run 2.3.X LS with 1.5.X ES.

And yes, that is correct.

I didn't knew that. Well i would try that for sure and let you know.

After upgrading to LS 1.5.3 and adding protocol => http , the s3 logs were getting successfully ingested. Somehow the LS 1.5.0 which i had was getting crashed after few successful runs.

Thanks a lot Mark for pointing it out.