Hi there,
we have Metricbeat, Filebeat, Logstash and Elasticsearch running on an OpenShift Cluster.
We are using Logstash to configure multiples pipelines to filter the logs for some namespaces and Beats types. Our stack is actually working, but with Logstash set between the beats instances and Elasticsearch, it reduces considerably the number of events sent to ES.
For example, if I set the Namespace filter directly into the Metricbeat Config and send to the Elasticsearch endpoints (so without logstash in between): it will send 40.000 Events a minute for this namespace:
With a logstash pipeline processing the Metricbeat logs containing a condition to scope the same namespace, it will send at the end maximum 2000 Events every 10 minutes.
I'm not sure where this 10min delay is coming from, but the events are strictly sent every 10min, even if the "pipeline.batch.delay" is set to 5ms or 100ms, it has no incidence.
Here below is our pipelines.yml example, with the different parameters that we have tried (we have been trying many different configurations, with different values)
pipelines.yml: |
- pipeline.id: one_metricbeat
path.config: "/usr/share/logstash/pipeline/one-metricbeat.conf"
queue.type: persisted
pipeline.workers: 4
pipeline.batch.size: 1000
pipeline.batch.delay: 5
queue.checkpoint.writes: 4096
- pipeline.id: one_filebeat
path.config: "/usr/share/logstash/pipeline/one-filebeat.conf"
queue.type: persisted
pipeline.workers: 4
pipeline.batch.size: 1000
pipeline.batch.delay: 5
queue.checkpoint.writes: 4096
- pipeline.id: beats-server
path.config: "/usr/share/logstash/pipeline/beats.conf"
#queue.type: persisted
pipeline.workers: 8
pipeline.batch.size: 1000
pipeline.batch.delay: 5
queue.checkpoint.writes: 4096
I thought that we might have a performance problem with Logstash, but the pod has enough ressources and the JVM heap that we have set, is pretty decent (min and max 4GB). What's more, the CPU and Memory usage of the logstash pod isn't that high.
We are currently running the Version 8.0.0 for the whole stack, but we have encountered the same behaviour with the oldest version 7.15.0 and 7.16.2.
Do you have any idea how we can solve this ?
Many thanks in advance,
Kind Regards,