But if I switch topic_id name in both input and output configuration to a simple string name (please see commented) then it can read. Anything I am doing wrong in my kafka input? Maybe that's not how I need to append the date stamp to prefix string?
On the output, topic gets sprintf(@topic_id) but the inputs does not.
I wouldn't recommend that anyway for the input. For example what if the Kafka topic is behind, right at midnight, the input would cease reading from that topic that was behind.
Luckily you can use white_list and black_list instead of topic_id which allows the strings that are java compatible regular expression. So try white_list => "f5-logs-wc1.*". That should pick up new topics as they appear.
So correct me if I am wrong. Will the white_list => "f5-logs-wc1.*". allow me to read from multiple topics?
In other words, the case scenario that you mentioned, where topic is behind, and there is a new topic generation. Will i be reading from two topic simultaneously?
Yes, in fact if you create a topic every day you will eventually be reading a ton of topics. You should consider just having a logging topic with a delete.retention.ms set to a value you are comfortable with. The default is 24 hours. 7 days is nice.
Got it. Also, at the moment I have one partition for the topic, but two consumers. Will both consumers be reading from the same partition and generating duplicate data, or one will be on idle?
Also, what if I have more partitions than consumers, how will that work?
When the number of threads exceeds the number of consumable partitions amongst a consumer group, excessive threads will idle. When # threads < # partitions, all partitions in a consumer group will be distributed amongst threads.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.