Is there any good example for specifying the message_key for logstash Kafka output to balance the messages to be stored between different partitions on the kafka server side? I have 3 partitions in the topic, and running 3 logstash instances. Hope to see each Logstash instance would read messages off different partition. Looks like all the messages are coming to the same partition.
I am using timestamp as the message key from client side. The order of messages is not critical.
In this case, w/o message_key specified, the massages are distributed across 3 partitions.
Test B) I have logstash installed on the cluster of servers. Following is snippet of logstash conf file:
input {
file {
type => "zookeeper"
path => "/home//logs/zookeeper.out."
exclude => ".gz"
}
file {
type => "cassandra"
path => "/home//logs/cassandra/system*"
}
.........
}
filter {
.....
grok {
match => {
"message" =>"(?<message_hash_key>[\d\w\s\W]{30})"
}
}
}
output {
kafka {
bootstrap_servers => "10.4.78.31:9092"
topic_id => "kafka-logs-new" #message_key => "%{message_hash_key}"
}
}
since I can NOTt use "@timestamp" in "message_key" in output, just parse
first 30 character of log message, which contains time stamp plus other
information.
In test B, all the messages go to the same partition. My goal is to distribute the logs among the 3 partitions so that 3 logstash servers can read the messages off 3 partitions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.