Hello,
I'm trying to ingest EMR logs from S3, but we have noticed that the message field is being populated with the whole log contents, not line by line as we would expect.
input {
s3 {
aws_credentials_file => "/etc/aws-creds/config.yaml"
region => "${AWS_REGION}"
bucket => "${BUCKET_NAME}"
prefix => "logs/"
backup_add_prefix => "logs-processed/"
backup_to_bucket => "${BUCKET_NAME}"
interval => 120
delete => true
add_field => {
"type" => "emr_job"
}
}
}
output {
kafka {
topic_id => "logstash"
bootstrap_servers => "${KAFKA_ENDPOINT}:9092"
codec => json
}
}
One thing I find interesting is that the logs are being tagged as multiline by default, even when not using that codec. We have tried using different codecs such as multiline and gzip_lines (because of the file format) but we are still not having any luck.
When looking at the contents of the log file itself, each line is separated by a newline (using set limit
in vim to see this). This is a pretty strange issue and I am curious if this is specific to behavior of the s3 input plugin.