Hello
I have two logstash servers as front-end, with Kafka Output and two logstash servers as back-end with Kafka Input.
The front-ends are putting all events into one topic, theoretically the back-ends should read from the topic. In my case only one back-end servers can read from Kafka.
i was trying to set up Replicated ZooKeeper Cluster and Multi-broker cluster with replicated topic, and still have the same.
Can you help me, what I am doing wrong ?
Logstash Kafka Output:
output {
kafka {
topic_id => "logstash-logs-replicated"
broker_list => "xxx.xxx.xxx.xxx:9092"
}
}
Logstash Kafka Input:
input {
kafka {
topic_id => 'logstash-logs-replicated'
}
}
If you want each backend logstash server to read from the same Kafka topic on its own (ie not in tandem) then you need to set the group_id
differently on each system. What is happening is the Kafka consumer uses the group_id to syncronize what is read from an input so if you want two independent readers of a topic, each needs its how group_id. Logstash defaults the group_id to logstash
so that is why only one is doing anything. https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-group_id
Thank you, Joe, for the answer.
I have tried to create topics with multiple partitions (for example 10), and then in logstash Input, for back-end, I set consumer_threads
to 5 on each. This solution resolved my issue.
Actually with your way, with different group_id
I will get the same output from Kafka for each back-end.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers
Apache Kafka