I'm trying to process CSV files stored in an S3 bucket using Logstash. Everything works fine until it gets to the last file, which it creates entries in Elasticsearch for endlessly.
The data is in daily time buckets, and each CSV file contains data for one day (grouped by various things). If I watch the document count in the Discover section of Kibana, a normal day shouldn't contain many more than 100,000 documents. The most recent day will continue to climb far into the millions before I stop Logstash.
As a troubleshooting step, I've removed the filter block from my config, and I still see the document count on that index go way higher than it should. This confirms it's not an issue with my filters, but perhaps an issue with the way I've configured my S3 plugin.
Here is that reduced config with some sensitive information censored:
input {
s3 {
type => "b"
endpoint => "<s3-compatible storage URL>"
access_key_id => "<redacted>"
secret_access_key => "<redacted>"
bucket => "b"
sincedb_path => "/var/lib/logstash/plugins/inputs/s3/b.sincedb"
}
}
output {
if [type] == "b" {
elasticsearch {
hosts => ["localhost:9200"]
index => "b"
}
}
}
I've tried with and without manually specifying the sincedb_path, and it does the same thing.
I don't see what would cause something like this unless there's something fundamental I'm misunderstanding about the S3 plugin. Any thoughts?