Kafka output is very slow with dynamic fields (sprintf)

I have a logstash 2.2.x process handling 6000 events per second. It has the following kafka output configured:

kafka {
bootstrap_servers => "kafkaserver:9092"
topic_id => "mytopic"
}

When we change the topic_id to use a dynamic expression such as:

kafka {
bootstrap_servers => "kafkaserver:9092"
topic_id => "%{[fields][topic]}"
}

It goes from 6000 events/sec to 100 events/sec, and we start getting "Beats input: the pipeline is blocked, temporary refusing new connection". We are using a machine with 4 cores and using 4 workers (logstash is launched with -w 4 -b 512)

Is that kind of degradation normal with dynamic fields? Looks like a bug to me, because other steps in the pipeline (ex. filters) use the same expression without much effect in the overall performance. Other outputs (ex. "file") can contain the same expression without any problems.

Thank you.

Sorry, my fault. Some events didn't have [fields][topic].

Protecting it with an if fixes the issue:

if [fields][topic] {
kafka { ... }
}

1 Like