Kafka offset metrics

John_06 · June 24, 2017, 11:24pm

I am curious if metrics kafka.partition.offset.oldest and kafka.partition.offset.newest are the same as CURRENT-OFFSET and LOG_END_OFFSET respectively?

From console both CURRENT-OFFSET and LOG_END_OFFSET shows the same value but kafka.partition.offset.oldest and kafka.partition.offset.newest are different:

Is it a bug?

steffens · June 26, 2017, 11:45am

seems like you're mixing conusmer and partition metrics here.

The CURRENT-OFFSET is the last offset processed by a consumer and LOG_END_OFFSET, the last event offset written be a consumer.

Kafka is not a real queue in a sense of, consumer once and data is gone. It's storing all data on disk. Every event stored by kafka gets an offset (which is basically an ID, as offset is increased by 1 for every event). When storing events, kafka splits topics into partitions and partitions into segments. Given the retention policy (default by time only) and segment size configurations, kafka will mark old segments as 'deleted' (they are cleaned much later by some GC worker). All of this must be taken into account when sizing kafka (always monitor disk usage. Every segment holds a range of events -> each segment has a start and an end offset => # of events in segment = end -start. Given you have many segments for a partition, the oldest offset is the oldest segment it's start offset and newest offset is the most recent/active segment its end offset. That is # of event stored on disk = kafka.partition.offset.oldest - kafka.partition.offset.newest.

John_06 · June 26, 2017, 9:08pm

Hi Steffen, thank you for your explanation. I am wondering what are the proper metrics to calculate a consumer lag then.

steffens · June 27, 2017, 8:37am

CONSUMER LAG = LOG_END_OFFSET - CURRENT-OFFSET. The lag is to be computed per conusmer group (LEFT JOIN on topic, partition, consumer group). As each partition acts as a queue itself, the lag is to be computed per partition. Having lag per partition, one can compute total consumer lag and max consumer lag + compare lag for being hopefully about evenly distributed between partitions for one conusmer group.

In theory the consumergroup metricset can be used to get the missing stats in metricbeat. But unfortunately this feature (still in beta btw.) contains a bug, not properly collecting the data.

John_06 · June 27, 2017, 5:50pm

Thanks.
Btw, I can see both metrics kafka.partition.offset.oldest and kafka.partition.offset.newest have the same value now after 2 days.

And the same chart looks very different today (first screen-shot is from June 24th, around 16:00)

steffens · June 28, 2017, 10:10am

The kafka distribution includes some command line tools you can use to query your kafka cluster for partitions and offsets. These are the offets as reported by kafka. To me it looks like data/segments have finally been deleted due to time-based retention policy?

system · July 26, 2017, 10:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitor kafka lag Beats metricbeat	4	1107	April 19, 2018
App for monitoring Kafka consumer lag Beats metricbeat	7	2369	June 30, 2017
Kafka consumer lag? Beats metricbeat	5	1732	February 14, 2019
Kafka consumergroup metricset & TSVB interval Kibana	3	284	September 20, 2021
Kafka input not working - no messages processed but offsets are updated to end Logstash	3	2206	March 13, 2019

Kafka offset metrics

Related topics