Logstash-s3-plugin not importing data or creating sincedb file

tom.isaacson · November 3, 2018, 7:02am

I have an S3 bucket containing logs from other S3 buckets showing downloads, etc. I pulled Docker then installed S3 input plugin | Logstash Reference [8.11] | Elastic. I created a logstast.conf file:

input {
    s3 {
      access_key_id => "XXXX"
      secret_access_key => "XXXX"
      bucket => "mybucket"
      region => "us-east-1"
      prefix => "logs/"
      type => "s3"
    }
}

output { 
  elasticsearch { hosts => ["localhost:9200"] }
}

then run this with "/opt/logstash/bin/logstash --path.data /tmp/logstash/data -f config/logstash.conf". The S3 input seems to start correctly and if I turn on --debug I can see it parsing logs:

[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] objects length is: {:length=>26999}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Found key {:key=>"logs/2016-05-21-13-32-20-1F8C0D40595F3975"}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Adding to objects {:key=>"logs/2016-05-21-13-32-20-1F8C0D405

but I'm never seeing any data in Elasticsearch. I added:
sincedb_path => "/etc/s3backup/sincedb"
to my .conf file but I'm not seeing anything in this directory.

Have I made an obvious mistake? What else can I do to debug this?

jordansissel · November 14, 2018, 4:16am

For these logs, how many objects is it finding?

The reason I ask is this: There's a problem with S3 that it is very slow to list objects. If your bucket has thousands (or millions) of objects, it can take many minutes or even hours to finish listing.

The way the S3 input plugin works today is basically this:

List all objects in the given prefix
Process any objects found by that listing.
Update sincedb each time an object completes processing.
Go back to step 1, ignoring any already-processed objects.

If step 1 would list millions of objects, then the act of listing could take an hour (or more), and Logstash would wait for the listing to complete before processing (step 2) any log files found in your S3 bucket. This is an annoying issue, I admit, though I dont' know for sure if this is what impacts you.

I wrote an issue documenting this phenomenon and a proposal for improving that might interest you.

Knowing how many objects your logs/ prefix has would help determine whether or not your issue matches what I am describing.

system · December 12, 2018, 4:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Input-S3 Plugin - Logic flaw with sincedb file Logstash	1	979	July 6, 2017
Duplicates entries when using S3 input Logstash docker	1	528	November 27, 2019
S3 input plugin. Parse again an S3 bucket Logstash	3	628	September 11, 2019
S3 Input Plugin Sincedb Time Logstash	1	1121	May 22, 2019
Logstash output not working when using logstash-s3-plugin Logstash	4	1119	October 20, 2017

Logstash-s3-plugin not importing data or creating sincedb file

Related topics