I am trying to measure Nginx latency for a given load, but having issues doing it, as I'm seemingly unable to aggregate 'count' into buckets. I can do the opposite relatively simply -- bucketizing Nginx latency in intervals of 25ms and getting the count, e.g:
Effectively, I want to flip the axis on this visualization.
Say, averaging the count of documents every minute, bucketizing that data in intervals of, say 10,000, and returning the average latency therein on the y Axis. Is there something fundamentally incorrect about the way I'm trying to aggregate data? Or should we be able to aggregate on the count of docs? Any workaround to this issue?
Hey Seth, thanks for posting your question. I'm using Kibana 6 to demonstrate a solution but I believe your version supports the same functionality. Take a look at how I've configured this Vertical Bar Chart visualization, but substitute your latency field for the machine.ram field I'm using:
Note how the X axis is using a date histogram. This is how you create buckets of documents over time. And for the Y axis I can do an aggregation on any field I want. This is where you can calculate the average latency. Does this help?
Thanks for your reply @cjcenizal. I'm likely not explaining myself clearly.. I'm not interested in a time-based x-axis.
What I'm trying to achieve is for example:
Given a bucket of requests over any discrete minute (e.g. 5k intervals from 0 to 100k) -- X-axis, what is the latency? - Y-axis.
For example:
I want to know: for any minute in the past hour (scope) which had between 5k-10k requests, what was the average latency over that minute?
-- the x-axis in this example outlined would be buckets that are the raw count of docs, in 5k intervals.. therefore multiple different minutes may fall in any bucket -- I don't care how many minutes had between 5k-10k requests, but rather what was the average latency for all of those minutes which had between 5k-10k requests.. it could be minutes 12:05, 12:25, and 12:53 -- that part doesn't need to be displayed, but rather the average latency for those particular minutes is what I'm looking for on the y-axis..
The actual date histogram as you've outlined isn't helpful in this scenario, as I'm not concerned with the trend over time per-se'.
Please let me know if I'm still not being clear.
Edit: I asked a colleague how to better-outline my issue: "We have a webserver log, each row with a timestamp, and a latency number for how long it took to service that request. We want to graph the latency on the Y-axis, and the load (defined as number of requests in that minute), on the X-axis. How do we approach this?"
OK, I think we're getting closer to figuring this out. From what you and your colleague are saying, it sounds like time is really an important component here. For any given minute, you want to know:
A) how many requests there were (and since each request is represented by a document, this is the same as the number of documents there are in that minute)
B) the average latency value for all of the requests in that minute
Because time is an important component here, I still think you want to use a Date Histogram and then adjust the timepicker to choose the time range you want to visualize. I can't think of any other way to define a range for the x-axis. The original visualization I proposed satisfied B, but not A. We can layer on additional data in this visualization to satisfy A as well. Take a look at the screenshot below, again replacing "machine.ram" with "latency":
On the left axis you'll have the average latency and on the right axis you'll have count. You can see both values for any given minute. You can use the timepicker to specify any time range you want to display in this visualization.
This is great! Almost exactly what I was looking for! I have easily replicated this and should have got this far on my own, thanks for your help!
The last ask that would really drive this home would be to aggregate on count -- using the example you provided -- I would like to combine minutes 14:20 & 14:22, as they have a count between 4-5:
See what I'm saying? The date histogram helps create the proper buckets for me to aggregate on, but I need to aggregate on those aggregations... Maybe this is a use-case for Timelion? I'm not familiar enough with that module yet..
Ohhhh! Haha, now I see what you're saying. Yes, what you're asking for makes total sense to me now -- sorry, I'm not sure why I didn't pick up on this earlier.
I pinged one of our visualizations engineers, @thomasneirynck, about this and it's a tricky problem. We're certain that neither Timelion nor Time Series Visual Builder can help you with this because they're both geared towards time-based visualizations.
We're looking into Pipeline aggregations as a possibile tool for re-aggregating your buckets, but it'll take some time to work through. We'll follow up here once we have some more info for you.
Seth, I'm afraid I have some bad news for you. Unfortunately, it doesn't look like ES will support the kind of query you'd need for the visualization you want. Hopefully ES evolves to the point where it supports this kind of query and then we can build a UI for it in Kibana. Sorry, I hope this isn't a deal-breaker for your team.
@cjcenizal thank you for your diligence in seeing my request through -- I understand that this is a tricky ask. I look forward to Elastic being able to handle complex queries such as this in the future. The last post you provided does give substantial insight, albeit takes some manual work, it will give us the data we need. I appreciate all of your efforts!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.