Kafka offset metrics

I am curious if metrics kafka.partition.offset.oldest and kafka.partition.offset.newest are the same as CURRENT-OFFSET and LOG_END_OFFSET respectively?

From console both CURRENT-OFFSET and LOG_END_OFFSET shows the same value but kafka.partition.offset.oldest and kafka.partition.offset.newest are different:

Is it a bug?

seems like you're mixing conusmer and partition metrics here.

The CURRENT-OFFSET is the last offset processed by a consumer and LOG_END_OFFSET, the last event offset written be a consumer.

Kafka is not a real queue in a sense of, consumer once and data is gone. It's storing all data on disk. Every event stored by kafka gets an offset (which is basically an ID, as offset is increased by 1 for every event). When storing events, kafka splits topics into partitions and partitions into segments. Given the retention policy (default by time only) and segment size configurations, kafka will mark old segments as 'deleted' (they are cleaned much later by some GC worker). All of this must be taken into account when sizing kafka (always monitor disk usage. Every segment holds a range of events -> each segment has a start and an end offset => # of events in segment = end -start. Given you have many segments for a partition, the oldest offset is the oldest segment it's start offset and newest offset is the most recent/active segment its end offset. That is # of event stored on disk = kafka.partition.offset.oldest - kafka.partition.offset.newest.

Hi Steffen, thank you for your explanation. I am wondering what are the proper metrics to calculate a consumer lag then.

CONSUMER LAG = LOG_END_OFFSET - CURRENT-OFFSET. The lag is to be computed per conusmer group (LEFT JOIN on topic, partition, consumer group). As each partition acts as a queue itself, the lag is to be computed per partition. Having lag per partition, one can compute total consumer lag and max consumer lag + compare lag for being hopefully about evenly distributed between partitions for one conusmer group.

In theory the consumergroup metricset can be used to get the missing stats in metricbeat. But unfortunately this feature (still in beta btw.) contains a bug, not properly collecting the data.

Btw, I can see both metrics kafka.partition.offset.oldest and kafka.partition.offset.newest have the same value now after 2 days.

And the same chart looks very different today (first screen-shot is from June 24th, around 16:00)

The kafka distribution includes some command line tools you can use to query your kafka cluster for partitions and offsets. These are the offets as reported by kafka. To me it looks like data/segments have finally been deleted due to time-based retention policy?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.