Logstash <- Kafka over WAN

Hi,

I'm looking for the best configuration for a high load system where Logstash is in the Azure cloud, and Kafka is in AWS.

At the moment I have 3 Kafka servers with 20 partitions for the topic.

I've created an auto scaling group in Azure for all the logstash servers, but even than I can't get them to reach full usage of the CPU (They have 4 CPU's each), which means I'm waisting resources, and the queue keeps filling up.

Any ideas how I can optimize the configuration for over wan pulling?
thanks.

What does your current Logstash configuration look like? Providing this will probably make it easier for the community to help.

Right now I'm using this:

input {
    kafka {
        group_id => "azure"
        rebalance_max_retries => 100
        consumer_threads => 1
        topic_id => "my-topic"
        fetch_message_max_bytes => 5242880
        zk_connect => "zk1:2181,zk2:2181,zk3:2181"
    }
}

filter {
    date {
        match => ["timestamp", "UNIX_MS"]
        remove_field => ["timestamp"]
    }
}

output {
    elasticsearch {
        hosts => ["es1", "es2", "es3", "es4"]
        manage_template => false
        workers => 20
        flush_size => 2000
        idle_flush_time => 5
    }
}

Not that this is an exact comparison, but maybe some of the idea's will help

I found that you need to make sure you pull multiple events, and compress the data. Also, I found that the reconnect and polling intervals will really hurt your wan as it is very chatty.

output {
   #Kafka is what logstash shippers send to
    kafka {
            #Comma seperated list of Kafka brokers
            bootstrap_servers => "zk:9092,......."
            compression_type => "gzip"
            retry_backoff_ms => 30000
            reconnect_backoff_ms => 30000
            linger_ms => 5000
            # Keep it simple for now we just call the topic logstash
            topic_id => '%{dst_index}'
            #workers => 1
    }

}

Hope this gives you some idea's

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.