We are ingesting logs of zscaler, we have come up with a bottle neck likely on the filebeat kafka output.
Deployment A:
[[ Zscaler NSS ]] -- syslog input --> [[ filebeat ]] -- kafka output --> [[ Event Hub ]] ... [[ Elastic Search ]]
This proved to be not fast enough, as the events on elastic search increased the lag by 9 hours. What we did was to quickly add logstash in the mix, which significantly increased the throughput.
Deployment B:
[[ Zscaler NSS ]] -- syslog input --> [[ filebeat ]] -- beat output --> [[ logstash ]] -- kafka output --> [[ Event Hub ]] ... [[ Elastic Search ]]
You can clearly see on this graph how much faster the java kafka client did, compared to the go kafka client.
Has anyone experienced something similar? What else can we try in tuning the filebeat output? We tried to increase the workers. We are thinking to increase the bulk_max_size. Any other suggestions? Thanks and appreciate the feedback.