How to remove some text in json file using logstash filter plugin? Which plugin to use?


(omkar) #1

We are trying to load cloudtrail logs into elasticsearch using logstash. But the json is in following format.
{"Records":[{"eventVersion":"1.05","userIdentity":{"type":"AWSService","invokedBy":"config.amazonaws.com"}........... (json file continuation). But if we load the log without any filter plugin , all the fields(awsRegion,type,invokedBy,etc) are coming into message field only,which doesn't serve us.

Tried using below filter "mutate" plugin different configs, but it didn't work.

filter{
mutate{
remove_field => ["Records" ] ----tried this
replace => { "message" => "Records: }----tried this
update => { "Records" => ""}-----tried this
}
}
Any help will be appreciated.


(Magnus Bäck) #2

Use a json_lines codec in your input plugin to deserialize the JSON strings in each input event.


(omkar) #3

Tried using json_lines codec, but didn't work.

input {
s3 {
"access_key_id" => "AKIAIELI3NYPJTWBGYKA"
"secret_access_key" => "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"
"bucket" => "cloudtraillogsvirtusacloud"
"region" => "sa-east-1"
"prefix" => "AWSLogs/912607726479/CloudTrail/sa-east-1/2018/05/03"
"codec" => "json_lines"
}
}

any suggestions?


(Magnus Bäck) #4

Exactly what do the events processed by Logstash look like? Use a stdout { codec => rubydebug } output.


(omkar) #5

We are not getting any output even if we give stdout{codec => rubydebug}.
If we manually download and update the json file by removing "Records" field that is in the starting of the file ,then using following config we are able to process the json file with all the fields available.Attached the json file .

input{
exec{
command => "cat 912607726479_CloudTrail_sa-east-1_20180504T0000Z_yK8mHG7E05CkUtrF.json"
codec => json_lines
interval => 60
}
}
output{
stdout{codec => rubydebug}
}

Output:
All the fields are getting separately (awsRegion,eventtype.eventsource,etc).

But our requirement is to pull hundreds of logs generated daily from S3 bucket. But without removing "Records" field from logs it is not processing. So kindly help to achieve this one.
Following is the config file we used:
input {
s3 {
"access_key_id" => "AKIAIELI3NYPJTWBGYKA"
"secret_access_key" => "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"
"bucket" => "cloudtraillogsvirtusacloud"
"region" => "sa-east-1"
"prefix" => "AWSLogs/912607726479/CloudTrail/sa-east-1/2018/05/03"
"codec" => "json_lines"
}
}
filter{
mutate{
remove_field => ["Records" ]
}}
output{
stdout{codec => rubydebug}
}


(Magnus Bäck) #6

But our requirement is to pull hundreds of logs generated daily from S3 bucket. But without removing "Records" field from logs it is not processing.

What do you mean by that? There should be errors or warnings in the log file.


(omkar) #7

Same thing we have achieved using Logz.io setup where they have a interface with different log shippers and using AWS cloud trail log shipper,configured our S3 bucket and saved it. After that in Kibana viewed our different fields generated from our cloudtrail logs.

Similarly we are trying to achieve using ELK Stack setup with logstash S3 plugin as shown in above conversation. The main thing is to remove the text "Records" from logs to get all the fields and view it in kibana?

Any idea on this will be helpful.


(omkar) #8

Every 5 min a log file is generated in s3 bucket for every region. We need to analyze these logs to pull some important information out of these logs.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.