S3 input logs are not being parsed line by line


I'm trying to ingest EMR logs from S3, but we have noticed that the message field is being populated with the whole log contents, not line by line as we would expect.

input {
  s3 {
    aws_credentials_file => "/etc/aws-creds/config.yaml"
    region => "${AWS_REGION}"
    bucket => "${BUCKET_NAME}"
    prefix => "logs/"
    backup_add_prefix => "logs-processed/"
    backup_to_bucket => "${BUCKET_NAME}"
    interval => 120
    delete => true
    add_field => {
      "type" => "emr_job"

output {
  kafka {
    topic_id => "logstash"
    bootstrap_servers => "${KAFKA_ENDPOINT}:9092"
    codec => json

One thing I find interesting is that the logs are being tagged as multiline by default, even when not using that codec. We have tried using different codecs such as multiline and gzip_lines (because of the file format) but we are still not having any luck.

When looking at the contents of the log file itself, each line is separated by a newline (using set limit in vim to see this). This is a pretty strange issue and I am curious if this is specific to behavior of the s3 input plugin.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.