Logstash: 2.0.0-1.noarch.rpm
Kafka: 2.11-0.9.0.0
According to the kafka doc, is it supposed to be round-robin if no message key is supplied?
Test A) I have script to covert the log message and then pipe them to the logstash.
input {
stdin {
codec => "json"
}
}
filter {
date {
match => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
locale => en
}
}
output {
kafka {
#broker_list => "10.4.78.31:9092"
bootstrap_servers => "10.4.78.31:9092"
acks => "0"
topic_id => "kafka-logs-new"
message_key => "%{timestamp}"
}
}
In this case, w/o message_key specified, the massages are distributed across 3 partitions.
Test B) I have logstash installed on the cluster of servers. Following is snippet of logstash conf file:
input {
file {
type => "zookeeper"
path => "/home//logs/zookeeper.out."
exclude => ".gz"
}
file {
type => "cassandra"
path => "/home//logs/cassandra/system*"
}
.........
}
filter {
.....
grok {
match => {
"message" =>"(?<message_hash_key>[\d\w\s\W]{30})"
}
}
}
output {
kafka {
bootstrap_servers => "10.4.78.31:9092"
topic_id => "kafka-logs-new"
#message_key => "%{message_hash_key}"
}
}
since I can NOTt use "@timestamp" in "message_key" in output, just parse
first 30 character of log message, which contains time stamp plus other
information.
In test B, all the messages go to the same partition. My goal is to distribute the logs among the 3 partitions so that 3 logstash servers can read the messages off 3 partitions.
Thanks