then run this with "/opt/logstash/bin/logstash --path.data /tmp/logstash/data -f config/logstash.conf". The S3 input seems to start correctly and if I turn on --debug I can see it parsing logs:
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] objects length is: {:length=>26999}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Found key {:key=>"logs/2016-05-21-13-32-20-1F8C0D40595F3975"}
[2018-11-03T06:03:06,273][DEBUG][logstash.inputs.s3 ] S3 input: Adding to objects {:key=>"logs/2016-05-21-13-32-20-1F8C0D405
but I'm never seeing any data in Elasticsearch. I added: sincedb_path => "/etc/s3backup/sincedb"
to my .conf file but I'm not seeing anything in this directory.
Have I made an obvious mistake? What else can I do to debug this?
The reason I ask is this: There's a problem with S3 that it is very slow to list objects. If your bucket has thousands (or millions) of objects, it can take many minutes or even hours to finish listing.
The way the S3 input plugin works today is basically this:
List all objects in the given prefix
Process any objects found by that listing.
Update sincedb each time an object completes processing.
Go back to step 1, ignoring any already-processed objects.
If step 1 would list millions of objects, then the act of listing could take an hour (or more), and Logstash would wait for the listing to complete before processing (step 2) any log files found in your S3 bucket. This is an annoying issue, I admit, though I dont' know for sure if this is what impacts you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.