I'm looking for the best configuration for a high load system where Logstash is in the Azure cloud, and Kafka is in AWS.
At the moment I have 3 Kafka servers with 20 partitions for the topic.
I've created an auto scaling group in Azure for all the logstash servers, but even than I can't get them to reach full usage of the CPU (They have 4 CPU's each), which means I'm waisting resources, and the queue keeps filling up.
Any ideas how I can optimize the configuration for over wan pulling?
thanks.
Not that this is an exact comparison, but maybe some of the idea's will help
I found that you need to make sure you pull multiple events, and compress the data. Also, I found that the reconnect and polling intervals will really hurt your wan as it is very chatty.
output {
#Kafka is what logstash shippers send to
kafka {
#Comma seperated list of Kafka brokers
bootstrap_servers => "zk:9092,......."
compression_type => "gzip"
retry_backoff_ms => 30000
reconnect_backoff_ms => 30000
linger_ms => 5000
# Keep it simple for now we just call the topic logstash
topic_id => '%{dst_index}'
#workers => 1
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.