My current pipeline is:
Rsyslog -> Kafka -> Logstash -> ES (5 nodes)
I see a huge time delay (Arnd 10hrs) between the logs processed by Kafka and the logs in my ES cluster.
the input and output plugin in logstash looks like this:
input {
kafka {
zk_connect => 'host:port'
topic_id => 'abc'
consumer_threads=> 50
codec => json
}
}
Kafka currently has 50 partitions.
output {
elasticsearch {
template => "/export/logstash_new/elasticsearch-template.json"
hosts =>["host1","host2","host3","host4","host5"]
template_overwrite => true
manage_template => true
codec=>plain
}
}
When I run my pipeline for a shorter duration (2 to 3 hours), no time delay is noticed.
However, gradually, as the time increases, the delay increases too.
How can I figure out where the problem currently resides?
Is logstash failing to process data? Or is there a problem in the indexing of Elasticsearch?
Current load of data is 6k messages per minute. (The load fluctuates)