Can't pull from kafka topic that has date stamp

stecino · April 8, 2016, 12:46am

Hello,

I have this issue, when I am able to generate a kafka topic that has date stamp in the name, but when I try to read from it, it fails to.

verified that topic created, and i can read from it. Snipped bellow

kafkacat -C -b "broker-0..consul:7000" -t f5-logs-wc1-2016.04.07 -p 0 -e | more
<189>Apr 6 17:57:01 blah-LB1 notice syslog-ng: Duplicate stats counter; counter='udp((null):514)'
<45>Apr 6 17:57:01 Blah-LB1 notice syslog-ng[23098]: syslog-ng starting up; version='2.1.4'
<189>Apr 6 17:57:01Blah-LB1 notice syslog-ng: syslog-ng startup succeeded

This is my kafka output

     kafka {
      bootstrap_servers =>  "broker-0.consul:7000"
      topic_id => "f5-logs-wc1%{+YYYY.MM.dd}"
      #topic_id => "f5-logs-wc1"
      codec => plain {
        format => "%{message}"
        charset => "CP1252"
      }

Here is my kafka input

kafka {

      zk_connect =>  "blah.com:2181"
      topic_id => "f5-logs-wc1%{+YYYY.MM.dd}"
      #topic_id => "f5-logs-wc1"
      codec => plain
      auto_offset_reset => "smallest"
      reset_beginning => true

     }

But if I switch topic_id name in both input and output configuration to a simple string name (please see commented) then it can read. Anything I am doing wrong in my kafka input? Maybe that's not how I need to append the date stamp to prefix string?

Joe_Lawson · April 13, 2016, 3:58pm

On the output, topic gets sprintf(@topic_id) but the inputs does not.

I wouldn't recommend that anyway for the input. For example what if the Kafka topic is behind, right at midnight, the input would cease reading from that topic that was behind.

Luckily you can use white_list and black_list instead of topic_id which allows the strings that are java compatible regular expression. So try white_list => "f5-logs-wc1.*". That should pick up new topics as they appear.

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-white_list

stecino · April 13, 2016, 8:37pm

So correct me if I am wrong. Will the white_list => "f5-logs-wc1.*". allow me to read from multiple topics?
In other words, the case scenario that you mentioned, where topic is behind, and there is a new topic generation. Will i be reading from two topic simultaneously?

Joe_Lawson · April 13, 2016, 8:59pm

Yes, in fact if you create a topic every day you will eventually be reading a ton of topics. You should consider just having a logging topic with a delete.retention.ms set to a value you are comfortable with. The default is 24 hours. 7 days is nice.

stecino · April 13, 2016, 9:13pm

Got it. Also, at the moment I have one partition for the topic, but two consumers. Will both consumers be reading from the same partition and generating duplicate data, or one will be on idle?

Also, what if I have more partitions than consumers, how will that work?

Joe_Lawson · April 13, 2016, 11:54pm

When the number of threads exceeds the number of consumable partitions amongst a consumer group, excessive threads will idle. When # threads < # partitions, all partitions in a consumer group will be distributed amongst threads.

stecino · April 18, 2016, 5:38pm

Thanks

Topic		Replies	Views
Logstash with multiple kafka inputs Logstash	5	3735	July 6, 2017
Logstash with kafka plugin not get messages in a new topic Logstash	2	312	August 4, 2020
Logstash-kafka for multiple topics Logstash	6	8020	July 6, 2017
Logstash kafka output works but kafka input doesn't Logstash	1	824	September 21, 2017
Logstash-output-kafka stops producing data without error if topic is not valid Logstash	1	747	July 6, 2017

Can't pull from kafka topic that has date stamp

Related topics