Hi,
I'm trying to use the Kafka consumergroup metricset to report global lag per consumer group. For simplicity's sake let's say that 1 consumergroup == one topic.
The problem is the lag is reported per partition so if I want a global view of all partitions I need to sum kafka.consumergroup.consumer_lag and group by kafka.consumergroup.id
Since I know my metricbeat is configured to send metrics every 30 seconds I can get an accurate number by changing Interval from 'auto' to '30s'. This however, prevents me from displaying data for a long period of time as I reach the max number of buckets. Setting it back to auto lets me get longer periods of time but is inaccurate as several 30s intervals are grouped and summed together.
That's an interesting problem. i will assume that the number of documents you get per interval is consistent (I am not familiar with that Kafka metricset). What I would do is to change it to Math aggregation in TSVB and the value of that sum by the number of 30 sec intervals that you have in your set interval (could be 1h or 1 day or whatever), then you would get an average of the consumer_lag per that set interval.
Not sure how useful that will be for your usecase, since you lose resolution, meaning you'll miss short lag peaks as they would be flattened with the average.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.