I have data coming from Burrow (metrics from kafka), that I poll using logstash http_poller.
It gives me nice JSON that I store directly in ES (default mapping for the moment).
For my test, I have a single kafka topic (named "test"), split inot 5 partitions. Burrow gives me the current offset of each partition, with values in an array.
Here's what a document looks like:
I would like to create a simple histogram in Kibana, having the max of each array cell (aka the current value).
Can this be done directly? I'd rather not sum the values, because kafka consumption is partition based, so I will next graph the consumption offset to visualize gaps between writes and consumption
I'm not sure I understand what it is you are trying to create a graph from.
the max of each array cell (aka the current value).
Do you mean you want to graph the max value from that array? If so, I can't think of a way to do this directly in Kibana, you'd have to enrich the data before you index it into Elasticsearch, something that should be pretty easy to add to your existing logstash pipeline.
If I'm wrong in my understanding, can you please try explaining it again?
In kafka, a topic is split in partitions, each consummed independently by the consumers. In my example, my topic has 5 partitions, which offset is in the ˋoffsetsˋ array.
Partitions may not be consumed and/or produced evenly, one partition can grow bigger than others, so I don't want to make the sum of all cells
I would like to create an histogram, with a column for each partition, and the Max value of each (actually it's the current value because the value only goes up): max(offsets[0]), max(offsets[1])...
Ah, that was the other way I was thinking, but I wasn't sure if you reliably knew if the indices of the array would always line up - sounds like that's the case though.
Unfortunately, Kibana doesn't have a way to do this with your current data structure, you'd need to change or enrich it. Two options I can think of:
You could do it if you were to add a field for each item that would be an index in the output array and use a Filter agg to show each of the fields on the x-axis
You could turn each of those values into their own document in ES, with a different name/description/type for each value and use a Terms agg on that field
In both cases, you'd use a Max value for the metric
That's what I'm trying to do: use logstash split filter to create 1 doc per partition, but I'm struggling with it, it doesn't seem stable. I think I wall create an issue right now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.