Kafka input plugin creates anywhere from 2-5x more documents than in topic

xrl · April 24, 2017, 6:18pm

I am trying to run a small pool of logstash with at-most-once delivery semantics on a kafka 0.10.2.1 cluster. The kafka input client is running with the new style of kafka consumer group, managed as metadata in the brokers, and it does not use Zookeeper for offset tracking. I have a brief commit interval to keep the offsets moving between consumers.

input {
  kafka {
    bootstrap_servers => "${KAFKA_BOOTSTRAP_SERVERS}"
    topics => ["proofpoint", "fortigate", "windows"]
    group_id => "logstash"
    auto_offset_reset => "latest"
    auto_commit_interval_ms => "50"

    codec => "json"
    consumer_threads => 1
    partition_assignment_strategy => "org.apache.kafka.clients.consumer.RoundRobinAssignor"
  }
}

My elasticsearch output config:

output {
    elasticsearch {
      hosts => ["${ELASTICSEARCH_CLIENT_HOST}"]
      index => "@%{vtype}-nad-v2-%{date}"
      flush_size => 1000
      idle_flush_time => 10
      sniffing => true
      sniffing_delay => 30
    }
}

Now I see in Elasticsearch that I have 5x the data from what is in kafka. What am I doing wrong?

system · May 22, 2017, 6:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash not committing offset to Kafka Logstash	1	1518	April 6, 2017
Kafka Input plugin not working in Logstash 5.4.1 Logstash	2	881	July 13, 2017
Logstash kafka input plugin:can not read data from kafka Logstash	3	1056	September 21, 2017
Not able to consume input from a kafka-topic Logstash	4	1695	July 6, 2017
Kafka input unable to create consumer after setting session timeout Logstash	4	9430	July 6, 2017

Kafka input plugin creates anywhere from 2-5x more documents than in topic

Related topics