I am using Logstash to read from Kafka. My VM is having 6 processors.
I looked at following two config:
pipeline.workers: Default is Number of the host’s CPU cores
The number of workers that will, in parallel, execute the filter and output stages of the pipeline. If you find that events are backing up, or that the CPU is not saturated, consider increasing this number to better utilize machine processing power
pipeline.output.workers: Default is 1
The number of workers to use per output plugin instance.
Since each kafka input will be processed in a single thread, to increase parallelism should I split it into multiple kafka inputs and change pipeline.output.workers: 6
Is this a good approach to maximize the usage of my VM?
input {
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
kafka {
bootstrap_servers=>"kfk1:9092,kfk2:9092"
topics => ["MyTopic"]
group_id => "kafka-test101"
}
}
output{
elasticsearch {
hosts => ["host1,host2,host3"]
index => "logstash-myindex-%{+YYYY.MM.dd}-1"
}
}