Logstash <- Kafka over WAN

shaharmor · March 11, 2017, 7:35pm

Hi,

I'm looking for the best configuration for a high load system where Logstash is in the Azure cloud, and Kafka is in AWS.

At the moment I have 3 Kafka servers with 20 partitions for the topic.

I've created an auto scaling group in Azure for all the logstash servers, but even than I can't get them to reach full usage of the CPU (They have 4 CPU's each), which means I'm waisting resources, and the queue keeps filling up.

Any ideas how I can optimize the configuration for over wan pulling?
thanks.

Christian_Dahlqvist · March 12, 2017, 1:43pm

What does your current Logstash configuration look like? Providing this will probably make it easier for the community to help.

shaharmor · March 12, 2017, 8:23pm

Right now I'm using this:

input {
    kafka {
        group_id => "azure"
        rebalance_max_retries => 100
        consumer_threads => 1
        topic_id => "my-topic"
        fetch_message_max_bytes => 5242880
        zk_connect => "zk1:2181,zk2:2181,zk3:2181"
    }
}

filter {
    date {
        match => ["timestamp", "UNIX_MS"]
        remove_field => ["timestamp"]
    }
}

output {
    elasticsearch {
        hosts => ["es1", "es2", "es3", "es4"]
        manage_template => false
        workers => 20
        flush_size => 2000
        idle_flush_time => 5
    }
}

eperry · March 12, 2017, 10:39pm

Not that this is an exact comparison, but maybe some of the idea's will help

I found that you need to make sure you pull multiple events, and compress the data. Also, I found that the reconnect and polling intervals will really hurt your wan as it is very chatty.

output {
   #Kafka is what logstash shippers send to
    kafka {
            #Comma seperated list of Kafka brokers
            bootstrap_servers => "zk:9092,......."
            compression_type => "gzip"
            retry_backoff_ms => 30000
            reconnect_backoff_ms => 30000
            linger_ms => 5000
            # Keep it simple for now we just call the topic logstash
            topic_id => '%{dst_index}'
            #workers => 1
    }

}

Hope this gives you some idea's

system · April 9, 2017, 10:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Kakfa is Slow and Rebalancing Often Logstash	1	398	August 17, 2020
Logstash not pulling data fast enough from Kafka Logstash	4	1144	July 4, 2023
Logstash -> Kafka output preferred settings Logstash	4	1845	July 6, 2017
ADVICE For loadbalance Logstash	2	229	November 25, 2021
Logstash does not consume Kafka Logstash docker	1	916	September 10, 2021

Logstash <- Kafka over WAN

Related topics