I have setup where logstash reads Kubernetes logs from 20 different buckets and send them to ELK. The logs seems to be coming 3-5 minutes late to ELK. The logstash running docker on VM with 31GB Xms/Xmx.
I am using one pipeline . Tried 6 pipelines with 2~3 each and got double/triple events from each pipeline.
How to speed up logstash ingestion from S3 buckets.
The buckets may had lots of files initially but now we have regular number of files. I have delete => true to delete file post processing. I have 10 cpus assigned to VM. I started with 16GB for jvm then kept increasing it.
@leandrojmp do I need to change the way kubernetes clusters saves logs to buckets to lower number of files ? Right now files get saved on each bucket as "cluster_name/yyyy/mm/dd/yyyymmddxxxxxx__yy.gz"
If you can reduce the number of files I think that you should try to do it.
Logstash s3 input has a couple of issues when working with buckets with a lot of files.
Personally I do not use this input because the performance is pretty bad in my use case (logs from AWS services) and I was not able to fix or improve it, so a custom collector was needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.