Logstash S3 input slow ingestion

I have setup where logstash reads Kubernetes logs from 20 different buckets and send them to ELK. The logs seems to be coming 3-5 minutes late to ELK. The logstash running docker on VM with 31GB Xms/Xmx.

I am using one pipeline . Tried 6 pipelines with 2~3 each and got double/triple events from each pipeline.

How to speed up logstash ingestion from S3 buckets.


Do these buckets have a lot of files in them?

This seems way too much, Logstash is more CPU bound than memory bound, what is the CPU count for your Logstash container?

The buckets may had lots of files initially but now we have regular number of files. I have delete => true to delete file post processing. I have 10 cpus assigned to VM. I started with 16GB for jvm then kept increasing it.

Steady state shows ~ 80-90 MB per bucket.

@leandrojmp do I need to change the way kubernetes clusters saves logs to buckets to lower number of files ? Right now files get saved on each bucket as "cluster_name/yyyy/mm/dd/yyyymmddxxxxxx__yy.gz"

If you can reduce the number of files I think that you should try to do it.

Logstash s3 input has a couple of issues when working with buckets with a lot of files.

Personally I do not use this input because the performance is pretty bad in my use case (logs from AWS services) and I was not able to fix or improve it, so a custom collector was needed.

@leandrojmp what do you recommend for temporary storage for Kubernetes logs until logstash pulls them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.