Logstash-s3-plugin not importing data or creating sincedb file


#1

I have an S3 bucket containing logs from other S3 buckets showing downloads, etc. I pulled https://hub.docker.com/r/sebp/elk/ then installed https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html. I created a logstast.conf file:

input {
    s3 {
      access_key_id => "XXXX"
      secret_access_key => "XXXX"
      bucket => "mybucket"
      region => "us-east-1"
      prefix => "logs/"
      type => "s3"
    }
}

output { 
  elasticsearch { hosts => ["localhost:9200"] }
}

then run this with "/opt/logstash/bin/logstash --path.data /tmp/logstash/data -f config/logstash.conf". The S3 input seems to start correctly and if I turn on --debug I can see it parsing logs:

[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] objects length is: {:length=>26999}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Found key {:key=>"logs/2016-05-21-13-32-20-1F8C0D40595F3975"}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Adding to objects {:key=>"logs/2016-05-21-13-32-20-1F8C0D405

but I'm never seeing any data in Elasticsearch. I added:
sincedb_path => "/etc/s3backup/sincedb"
to my .conf file but I'm not seeing anything in this directory.

Have I made an obvious mistake? What else can I do to debug this?


(Jordan Sissel) #2

For these logs, how many objects is it finding?

The reason I ask is this: There's a problem with S3 that it is very slow to list objects. If your bucket has thousands (or millions) of objects, it can take many minutes or even hours to finish listing.

The way the S3 input plugin works today is basically this:

  1. List all objects in the given prefix
  2. Process any objects found by that listing.
  3. Update sincedb each time an object completes processing.
  4. Go back to step 1, ignoring any already-processed objects.

If step 1 would list millions of objects, then the act of listing could take an hour (or more), and Logstash would wait for the listing to complete before processing (step 2) any log files found in your S3 bucket. This is an annoying issue, I admit, though I dont' know for sure if this is what impacts you.

I wrote an issue documenting this phenomenon and a proposal for improving that might interest you.

Knowing how many objects your logs/ prefix has would help determine whether or not your issue matches what I am describing.