My current pipeline is:
Rsyslog -> Kafka -> Logstash -> ES (5 nodes)
I see a huge time delay (Arnd 10hrs) between the logs processed by Kafka and the logs in my ES cluster.
the input and output plugin in logstash looks like this:
zk_connect => 'host:port'
topic_id => 'abc'
codec => json
Kafka currently has 50 partitions.
template => "/export/logstash_new/elasticsearch-template.json"
template_overwrite => true
manage_template => true
When I run my pipeline for a shorter duration (2 to 3 hours), no time delay is noticed.
However, gradually, as the time increases, the delay increases too.
How can I figure out where the problem currently resides?
Is logstash failing to process data? Or is there a problem in the indexing of Elasticsearch?
Current load of data is 6k messages per minute. (The load fluctuates)