Hi,
I have an issue where the S3 input plugin doesn't seem to propagate the processed data, at all. I can see that the files in my bucket are being processed:
...
logstash_1 | [2018-03-05T13:09:36,102][DEBUG][logstash.inputs.s3 ] S3 input processing {:bucket=>"idd-data", :key=>"test10/ls.s3.a8ea640e-1080-428c-930e-27eb19499027.2018-02-08T08.05.part0.txt.gz"}
logstash_1 | [2018-03-05T13:09:36,102][DEBUG][logstash.inputs.s3 ] S3 input: Download remote file {:remote_key=>"test10/ls.s3.a8ea640e-1080-428c-930e-27eb19499027.2018-02-08T08.05.part0.txt.gz", :local_filename=>"/tmp/logstash/ls.s3.a8ea640e-1080-428c-930e-27eb19499027.2018-02-08T08.05.part0.txt.gz"}
logstash_1 | [2018-03-05T13:09:36,206][DEBUG][logstash.inputs.s3 ] Processing file {:filename=>"/tmp/logstash/ls.s3.a8ea640e-1080-428c-930e-27eb19499027.2018-02-08T08.05.part0.txt.gz"}
...
The problem is that i have a huge number of files that will hopefully be processed by Logstash and the processed data doesn't seem to get propagated as i'd expect.
I expect the processed data to be sent to Elastic Search in "smaller" batches. My feeling is that the input plugin doesn't send the data down the pipeline until all the files have been processed. I've so far waited more than two hours on data but it doesn't appear, and the input plugin is still processing the input files.
I'm using ELK stack version 6.2.2 for this, running in docker images through docker-compose.
I'm not sure what i can/should do about this. I cannot for example find any option where i can say that data should be pushed down the pipeline either when all files have been processed for the current round or when we've received say 1000 entries/lines.
This is my pipeline.
input {
s3 {
region => "eu-west-1"
bucket => "idd-data"
prefix => "${bucket_prefix:}"
codec => json_lines
}
}
filter {
}
output {
elasticsearch {
hosts => "http://elasticsearch:9200"
}
stdout {
codec => rubydebug
}
}
Any suggestions?