I am having trouble in making multiple Logstash instances read from a single Kafka topic.
The Logstash instances are all on the same server, together with Kafka and Zookeeper. When I check the topic with kafka-console-consumer and kafka-console-producer scripts everything is working fine and the messages are delivered.
I have a Filebeat instance that writes to a Kafka topic, the topic has 3 partitions and a replication factor 1 (single Kafka node). The Logstash instances have each their own config file with with distinct values for:
node.name
path.data
path.logs
And a config file that looks like (the numbers for logstashN are set for each instance config file):
input {
kafka {
bootstrap_servers => "localhost:9092"
topics => "netflow"
group_id => "logstash"
consumer_threads => 3
client_id => "logstash1"
client_rack => "rack-1"
#partition_assignment_strategy => "round_robin"
}
filter {
mutate {
add_field => {"logstash_id" => "logstash1"}
}
}
output {
elasticsearch {
hosts => ["xxx.xxx.xxx:9200"]
index => "kafkagenerated-%{+YYYY.MM.dd}"
id => "logstash1"
}
}
I start the Logstash instances each with their own config files like so:
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/kafka-input.conf
/usr/share/logstash1/bin/logstash -f /etc/logstash1/conf.d/kafka-input.conf
/usr/share/logstash2/bin/logstash -f /etc/logstash2/conf.d/kafka-input.conf
They all start without error and consume the messages from the "netflow" topic. Also, when I check the group with:
bin/kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group logstash
I get an output that looks as if the consumers were load-balanced and that every Logstash instance did some work and indexed to Elasticsearch, this is run during high load when there is a lot of netflow data incoming (so the lag is understandable)
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
logstash netflow 0 3189623 3245768 56145 logstash0-0-ff1448c5- 2742-4247-82fc-212d0a894f63 /127.0.0.1 logstash0-0
logstash netflow 2 1398036 1441479 43443 logstash0-2-9598982e-8565-4fd8-8f9a-85ada0b10f07 /127.0.0.1 logstash0-2
logstash netflow 1 1409207 1441478 32271 logstash0-1-7f14eb14-38d3-4f9f-bc25-0fdc582ce835 /127.0.0.1 logstash0-1
HOWEVER when I check inside Elasticsearch all the documents have the same logstash_id, meaning that all the events are written by a single Logstash instance (I have checked the config files, every instance has its own number in the logstash_id).
To validate this I tried to shut down all instances but one and the indexing rate on the Elasticsearch is the same as when I have all three instances running.
My goal is to have as many Logstash instances as needed to get around 50k index/sec. In a production environment this would be on multiple machines .
Can someone please give me an idea how to troubleshoot this and how to get all the Logstash instances to work? Thank you in advance!