Hello, Filebeat developers and users, I am wondering what is your experience with filebeat output throughput?
If I interpret the following filebeat log correctly, filebeat throughput to kafka is only 53KBps.
we pin filebeat to 2 CPU cores () in a 52 core machine (shared with other light weight processes on those two cores). Sometimes, we observed 15-45 mins delay in filebeat sending out logs to kafka, therefore, it seems like the low throughput is not due to we do not have enough logs to be sent out. The following log line is the biggest number on libbeat.outputs.kafka.bytes_write we have seen.
I am wondering Is this speed expected? thanks!
2019-03-08T18:16:52Z INFO Non-zero metrics in the last 30s: beat.info.uptime.ms=30001 beat.memstats.gc_next=863930544 beat.memstats.memory_alloc=469037928 beat.memstats.memory_total=411464408889168 filebeat.events.active=36 filebeat.events.added=9234 filebeat.events.done=9198 filebeat.harvester.open_files=104 filebeat.harvester.running=104 libbeat.config.module.running=0 libbeat.output.events.acked=9198 libbeat.output.events.active=104 libbeat.output.events.batches=30 libbeat.output.events.total=9302 libbeat.outputs.kafka.bytes_read=35850 libbeat.outputs.kafka.bytes_write=1610265 libbeat.pipeline.clients=1 libbeat.pipeline.events.active=134 libbeat.pipeline.events.published=9234 libbeat.pipeline.events.total=9234 libbeat.pipeline.queue.acked=9198 registrar.states.current=126 registrar.states.update=9198 registrar.writes=29
Which Filebeat version? How many files are harvested concurrently? Can you share configs as well?
Have you set max_procs: 2? If not filebeat will create 52 OS threads, which the OS will reschedule on 2 cpus only. At worst these 2 cores are the same core with hyperthreading.
What is the CPU usage of filebeat?
Do you read from local disks? What throughput do you get by sending to console output if you run (better do with small test setup with separate logs and registry file): filebeat -c filebeat-test.yml | pv -Warl >/dev/null. What is the CPU usage when sending to console?
Depending on settings the kafka output might compress events. In this case libbeat.outputs.kafka.bytes_write=1610265 is not the raw event bytes.
libbeat.output.events.batches=30 indicates that about 1 batch of events is send per second. This is a rather low value. A little over 9k events are ACKed per second. This seems to suggest that the aver batch size is only a little over 300 events.
HI, @steffens thanks for the tips! please see my reply below:
The filebeat version is 6.1.3, up to ~215 files being harvested concurrently. please see our current filebeat config file used below.
(1) We did not set max_procs:2. I will add that and see whether it makes a big difference.
(2)the filebeat CPU usage is up to 90's%.
(3) Yes, we read from local disks (mounted SSD).
(4) thanks for the expl. it is good to know that libbeat.outputs.kafka.bytes_write is compressed bytes. Even considering 1:10 compression ratio, 530KBps is not much either? given that SSD read speed is in the range of x00MBps, 25g nic.
(5) "This seems to suggest that the aver batch size is only a little over 300 events." What does this imply?
You have configured 20 OS threads, but filebeat is only at 90% -> this looks like most threads are just idling. Not many outputs are active.
(3) Yes, we read from local disks (mounted SSD).
(4) thanks for the expl. it is good to know that libbeat.outputs.kafka.bytes_write is compressed bytes. Even considering 1:10 compression ratio, 530KBps is not much either? given that SSD read speed is in the range of x00MBps, 25g nic.
Have you tested with console output? Backpressure can happen in many reasons. When tuning/debugging try to remove as much noise as possible from your tests.
From experience the problem is not with reading the file, but network/remote service itself.
"This seems to suggest that the aver batch size is only a little over 300 events." What does this imply?
I guess that your settings are not doing what you think they are doing. Filebeat has a shared memory queue used to collect events. This queue is split according to output.kafka.bulk_max_size into smaller batches, for async processing. The output worker then distributes the events in a batch ontp the brokers send queues per topic/partition pair. E.g. if you have 10 brokers, each send queue will get 30 events. The send queue is eventually send, and we eventually get an ACK from kafka, while we continue to fill the send queue. It's a pretty dynamic async process.
As filebeat 5.x and 6.x act differently I wonder which filebeat version you are using. But I'd recommend Filebeat 6.
The moment the metrics have been written you have 104 files open, but only 36 events to be send filebeat.events.active=36. Some more logs would be helpful.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.