I have use filebeats to collect logs and sent to kafka.
1000 /sec, the cpu usage is 4%.
9000+ /sec, the CPU usage reach to 500+%.
AWS ec2 with 24 core.
can you share your filebeat config please? Any filebeat log?
fielbeat config
filebeat:
# List of prospectors to fetch data.
prospectors:
- input_type: log
paths:
- /opt/xxxx.log
tail_files: true
output:
kafka:
hosts: ["foo.com:9092"]
topic: "foo_staging"
use_type: false
client_id: "tests"
# The number fo workers to use for this output
worker: 2
# The maximum number of events to bulk in a single request.
bulk_max_size: 16384
# The number of seconds to wait for responses from the kafka brokers before timing out.
broker_timeout: "30s"
# KeepAlive specifies the keep-alive period for an active network connection. If zero, keep-alives are disabled. (default is 0: disabled).
keep_alive: 0
# The maximum amount of time the server will wait for acknowledgments from followers to meet the acknowledgment requirements the producer has specified with the acks configuration
timeout: 30
# ACK reliability level required from broker. 0=no response, 1=wait for local commit, -1=wait for all replicas to commit. The default is 1.
required_acks: 0
# The number of seconds to wait for new events between two producer API calls.
flush_interval: 5
logging:
files:
rotateeverybytes: 10485760 # = 10MB
The higher the throughput, the higher CPU usage due to filebeat reading + encoding more lines. But I've no idea what's going on here or any of your test setup.
You have one filebeat sending to how many kafka nodes? How many partitions does your topic have?
How do you measure throughput and generate different loads? (1000/sec vs 9000+/sec ?)
can you set warning level in filebeat.yml and check log file for any errors/warning?
having set worker: 2
configures filebeat to double the number of kafka clients + worker routines doing IO.