S3 input logs are not being parsed line by line

JaMz_Jack_GN · May 28, 2019, 8:24pm

Hello,

I'm trying to ingest EMR logs from S3, but we have noticed that the message field is being populated with the whole log contents, not line by line as we would expect.

input {
  s3 {
    aws_credentials_file => "/etc/aws-creds/config.yaml"
    region => "${AWS_REGION}"
    bucket => "${BUCKET_NAME}"
    prefix => "logs/"
    backup_add_prefix => "logs-processed/"
    backup_to_bucket => "${BUCKET_NAME}"
    interval => 120
    delete => true
    add_field => {
      "type" => "emr_job"
    }
  }
}

output {
  kafka {
    topic_id => "logstash"
    bootstrap_servers => "${KAFKA_ENDPOINT}:9092"
    codec => json
  }
}

One thing I find interesting is that the logs are being tagged as multiline by default, even when not using that codec. We have tried using different codecs such as multiline and gzip_lines (because of the file format) but we are still not having any luck.

When looking at the contents of the log file itself, each line is separated by a newline (using set limit in vim to see this). This is a pretty strange issue and I am curious if this is specific to behavior of the s3 input plugin.

system · June 25, 2019, 8:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing mail from (AWS) S3 Logstash	2	905	July 24, 2017
S3 input not passing metadata Logstash	17	2545	August 16, 2019
S3 input plugin with multiline codec. Last one is always lost Logstash	2	1223	July 6, 2017
S3 input and Elasticsearch output in Logstash Logstash	3	2202	July 6, 2017
Parse Amazon S3 access log with multiple files Logstash	8	3075	July 6, 2017

S3 input logs are not being parsed line by line

Related topics