Logstash 2.3.1 and Kafka offset issue

crazyphil · April 25, 2016, 2:19pm

I have finally built a proper pipeline for logstash to pull data from Kafka, and insert into elasticsearch.

However I seem to be having an issue, or possibly self induced configuration problem:

I need to pull from 8 topics, and here is the general config:

kafka {
        auto_offset_reset => "smallest"
        reset_beginning => "true"
        consumer_id => "logf003"
        consumer_threads => "6"
        group_id => "kafka_DataMetrics"
        topic_id => "DataMetrics_SharedDB"
        type => "kafka-datametrics"
        zk_connect => "zookeeper.service.consul:2181"
}

So my issue is this - if I use auto_offset_reset smallest, my 3 logstash forwarders if they crash would always go to offset #1 in a topic? Would this not cause a lot of data to be resubmitted? Given I have over 2 billion offsets to put into elasticsearch, wouldn't a crash and restart really set my processing back?

If I use the defaults, my logstash forwarders never pull all the data from the start of my kafka topics.

What should I use in this case so that I can:

Load all data up to current from my Kafka topics
Ensure that should my logstash instances crash, they will restart in an appropriate place, and not pull/send duplicate entries to elasticsearch?

I will also add that it would appear that all my logstash instances are pulling the exact same data at the same time because their message rates as tracked by metrics are all exactly the same.

Thanks for any advice.

Joe_Lawson · April 27, 2016, 2:16pm

Take out the offset reset as it is destructive for consumer groups. The
toggle says, delete my offsets for my group so if a crash occurred you will
lose all progress. That should fix your problem.

Joe_Lawson · April 27, 2016, 2:18pm

Reset beginning is only applicable when either a consumer group doesn't
exist or the consumer group's topic offset is out of bounds with the
current topic.

crazyphil · May 2, 2016, 6:14pm

Ok, that corrected my understanding of the two settings.

I now run with "smallest" offset as the default, and things seem to be moving along quite well.

Thanks!

Topic		Replies	Views
How to reset / revert Kafka offset of a consumer group used by Logstash? Logstash	1	1067	February 10, 2017
Logstash Kafka Offset Handling Logstash	1	1167	February 5, 2018
How to re fetch data from Kafka, from the beginning of topic, without changing consumer group? (resetting consumer group offset) Logstash	2	5370	June 9, 2017
Ingesting latest logs from kafka with logstash Logstash	3	412	February 12, 2020
Logstash not committing offset to Kafka Logstash	1	1535	April 6, 2017

Logstash 2.3.1 and Kafka offset issue

Related topics