Producer/broker/16 maximum request accumulated, waiting for space kafka output


(Userguy) #1

Hello I am seeing this message in my filebeat Messages , As of now i have set bulk to 3072 and will see if there is any improvement .

Additional query i have is how does worker impact in kafka output .By default is one .

But as far as i know even if i mention 1 host in output kafka it will in return all broker ips so will it open all brokers * workers network connection .

If i increase worker to say 3 how will it impact overall load .


(Steffen Siering) #2

The message is logged because the send buffers are full. The kafka client is waiting for ACK from kafka for an older batch in order to flush it's buffers. So network of kafka itself might create some backpressure. All in all data will not be lost, as filebeat will retry until there is enough space. Unfortunately the kafka client used by Beats can be somewhat chatty at times and doesn't differentiate between errors and debug messages.

Increasing the worker count will increase the number of kafka clients being used by 3, each publishing events independently. Batches of events are load balanced among 3 client instances. But it depends on your queue settings how effective this is. If there is some quota/rate limiting in Kafka or intermediate network device configured you might not gain much, as your overall bandwidth might be limited.

A kafka client is not sending data to a broker, but to a topic. A topic is split into partitions. Each partition is owned by a broker, plus has replicas among other brokers. One broker is the leader of a partition at any time. The leader of a partition changes to another broker every now and so often. This means a kafka client connects to a cluster, not to a broker. The connection to the cluster is established using a bootstrap process by querying one of the initial configured brokers for the cluster metadata. Then a connection to each broker is established as required for serving the partitions of said topic. E.g. Having 10 brokers means you have 10 connections. Having 3 workers with 10 brokers means you will have 30 TCP connections.

Scaling in Kafka is normally established via the number of partitions for a topic (assuming you have no common 'bottleneck' like a NAT/firewall device between Filebeat and Kafka).


(Userguy) #3

Hello Thanks for your reply ,

Is there a way to increase buffer queue . My filebeat is used for heavy writing to Kafka .Just to add on , i understand that connection is made to broker but in my case i am writing to 10+ topic . and i have around 40+ broker in case of 1 worker it means that it will have 40 connection if i have 40 partitions ( Since each partition will have a one leader on each broker ) .
second - If i have 40 broker and 40 partition for each 5 topic then when writing will kafka client use single worker TCP connection for all topic as per how many leaders are there will that be a case my buffer is getting full


(Steffen Siering) #4

For queue buffer see Filebeat Queue docs.

The kafka output collects a batch from the queue and then splits the batch into topics and partitions. The batch size is configured via output.kafka.bulk_max_size. Having multiple topics and partitions you have too think of them as pairs (topics + partition). Having 5 topics and 40 partitions per topic give you a total of 200 'targets'. In Beats each broker has it's own send buffer (and separate worker). That is the 200 'targets' are somewhat distributed among the 40 brokers you have. Depending on event target an event is pushed into the send buffer of the correct broker. Each brokers buffer is configured via output.kafka.channel_buffer_size (default 256). It is this send buffer which eventually gets filled up.

The kafka output has a separate IO thread per broker configured. That is having 40 brokers and 1 worker configured you actually already have a set of 40 asynchronous workers. By setting worker: 3 you will end up with a total of 120 IO workers. The worker setting configures the number of kafka client instances. A worker collects a batch of event from the queue and adds it to the kafka client instance, which finally schedules the per-broker events.


(Userguy) #5

Thanks for your detailed explanation - This make it much clear .