Logstash 2.3.1 and Kafka offset issue

I have finally built a proper pipeline for logstash to pull data from Kafka, and insert into elasticsearch.

However I seem to be having an issue, or possibly self induced configuration problem:

I need to pull from 8 topics, and here is the general config:

kafka {
        auto_offset_reset => "smallest"
        reset_beginning => "true"
        consumer_id => "logf003"
        consumer_threads => "6"
        group_id => "kafka_DataMetrics"
        topic_id => "DataMetrics_SharedDB"
        type => "kafka-datametrics"
        zk_connect => "zookeeper.service.consul:2181"
}

So my issue is this - if I use auto_offset_reset smallest, my 3 logstash forwarders if they crash would always go to offset #1 in a topic? Would this not cause a lot of data to be resubmitted? Given I have over 2 billion offsets to put into elasticsearch, wouldn't a crash and restart really set my processing back?

If I use the defaults, my logstash forwarders never pull all the data from the start of my kafka topics.

What should I use in this case so that I can:

  1. Load all data up to current from my Kafka topics
  2. Ensure that should my logstash instances crash, they will restart in an appropriate place, and not pull/send duplicate entries to elasticsearch?

I will also add that it would appear that all my logstash instances are pulling the exact same data at the same time because their message rates as tracked by metrics are all exactly the same.

Thanks for any advice.

Take out the offset reset as it is destructive for consumer groups. The
toggle says, delete my offsets for my group so if a crash occurred you will
lose all progress. That should fix your problem.

Reset beginning is only applicable when either a consumer group doesn't
exist or the consumer group's topic offset is out of bounds with the
current topic.

Ok, that corrected my understanding of the two settings.

I now run with "smallest" offset as the default, and things seem to be moving along quite well.

Thanks!