Plot raw data


(Steve Earl) #1

Hi All,

I am using the elapsed-filter plugin to obtain processing time for transactions crossing a bunch of services. I've been able to create a time-based histogram of this data in which the Y-axis shows the average processing time for each time-bucket that Kibana has scaled the graph into.

This is OK but isn't really useful for viewing outlier transactions, as they completely disappear due to the averaging. Ideally, because I have a few hundred/few thousand measurements (rather than millions) I would prefer to show the raw data on the Y-axis, plotted against time on the X-axis.

Does anyone know if this is possible in Kibana and if so how to do it - I haven't been able to figure this out, as I seem to have to apply an aggregation to any field I select on the Y-axis.

Alternatively, how have people got around this problem of the average resulting in a graph which just smooths outlying data. I can't imagine this problem is particularly unusual?

thanks in advance,
Steve


Can I plot raw values from fields as is in X/Y axis rather than aggregration?
(Tim Roes) #2

Hi Steve,

if I understand you correctly you would like to draw every document by itself in the graph, meaning being a dot on the x-axis with it's timestamp and the value in that relevant field on the y-axis? That is not possible, since this might lead to very high amount of points needed to be drawn, which even more modern browsers wouldn't be able to cope with.

But there should be a solution in case of the actual smoothing you want to prevent.
Besides the "average" metrics aggregation you could also add the "max" aggregation on the same field. That way you would get a second line showing you the maximal request time in this time slot.

You could use the "percentile" aggregation to get a bit more insights in the distribution, by e.g. showing the 50 and 95 percentile, you would get a line, showing the value under which 50% of the documents are (i.e. the median) and a line showing you the value under which 95% of all documents are. I've shown this as an example in the following screenshot (with transferred bytes instead of the response time).

Cheers,
Tim


(Steve Earl) #3

Hi Tim,

Thanks for the quick response. I can see where you're coming from with your comments around situations where the number of raw data points grows too large to be displayed.

Your suggestions make a lot of sense but I wondered if you could comment on how you would best visualise this with the Timelion chart I've shown below:

In my case I have 10 different sets of data displayed on the same chart, each of which represents the average response time for a different web service operation. Each operation has a different response time SLA (as some may be much more process-intensive than others) so the idea of the scatterplot was to be able to detect outliers.

I think your idea of having median and 95% outliers is a really good idea, but my concern would be applying this to the chart may make it unviewable (as I'd end up with three or four lines per data series, so 30 or 40 lines in all). I know that I can hide data series in Timelion easily enough by clicking on the legend item, but that would be a little laborious for any viewer as they'd have to click a lot of individual legend entries.

The current version I have now was trying to represent the SLAs for all the operations on a single chart. Perhaps that's not really a viable approach when we start including additional series representing median and 95%?

I appreciate none of the above is an issue with Kibana or Timelion per se (Timelion is very cool by the way - managing to figure out how to get it to generate automatic legends using a regex on a field was a good day!) but Ijust wondered if you could provide any insight on how you would best represent my data without ending up with either (a) a single cluttered chart that was not particularly readable or (b) a series of 10 individual charts that were readable but don't fit on a single dashboard anymore.

Best regards & thanks again for the help,
Steve


(Steve Earl) #4

Hi Tim,

As a follow up, when I plot my data as a line rather than use points(), I get a pretty horrible looking graph (see screenshot below):

Is this because my data isn't recorded at regular time intervals but instead represents calls by users to a web service? This means there may be significant periods of time (overnight for example) where there are no events.

Whilst I appreciate the chart probably isn't "wrong" is there a standard approach to make it more understandable? I guess the question here is how best to represent "no data", as simply showing a value of zero response time is definitely not the correct solution either.

At the moment my chart looks like my attempt at painting when I was a 3-year old.... :grinning:

Regards,
Steve


(Tim Roes) #5

Hi Steve,

Regarding interrupted charts

It's just not as easy as it looks. Because what would be the correct approach to do in this situation? We could pull the line down to zero for time slots without data. But that could lead to false conclusions, because how could they differentiate them now from actual buckets with a value of 0. So another approach would be, just skip those buckets when it comes to line drawing and connect it directly to the next existing point (i.e. interpolating the data over empty buckets), that would also be confusing, since it suggests the data between two points would be maybe existing. Depending on what you graph that could make sense or could lead to totally wrong conclusions.

So basically, without knowing what the domain of the data is, there is no better way to draw this without hiding facts about the data (missing buckets). Thus the best solution would be, to let that null-bucket behavior be chosen by the user. In Timelion we can do this already with the fit parameter. In TSVB there is an long issue about this with some discussion in #11793 and for the classical visualizations there is no way at that moment. So if you are handling a lot of that data I would recommend using timelion for now. (By the way, kudos for your awsome statistical drawing skills as a child :stuck_out_tongue_winking_eye:)

Regarding the timelion question

I would totally agree with you, that 30 lines are not really detectable anymore, and that also one chart per operation might be too much (even though I would consider it the better of those two approaches).

If each of those operations has a different SLA, it might be hard to visualize this within one graph, since what might be an outlier for one operation, might be within the average range of other operations, and thus just perish in the average of all the other APIs.

As a suggestion: if you know the concrete values for your SLA, you could highlight only the values, that are above that value, to make them easily detectable even in a graph with 10 different operations.

That could look as follows, with a line by line description what it does:

.es(opname:operation1, metric=avg:elapsed_time) // filter only for the data from that operation
  .if(gt, 15, null) // If this data is greater than 15 (your SLA) remove it from the graph
  .points(0.2).color(tomato), // draw these values that are within your SLA, very small (radius 0.2)

.es(opname:operation1, metric=avg:elapsed_time) // filter again to draw only outliers
  .if(lte, 15, null) // Remove all the data within the SLA (that is already drawn above from the graph)
  .points(fill=10, fillColor=tomato).color(tomato) // draw very large bullets for these outliers

You can now use that two expressions and repeat them for all of your ten operations, and you would get a graph, that shows outliers in a different size. The following example shows the same, using request logs and countries with different thresholds:

You can also draw a threshold line for each API with .static(15).line(width=1).color(tomato).

If you would want to draw average and max to the graph, you could also use multiple metric parameters to the .es function. Unfortunately timelion doesn't support percentile aggregation yet, so you are bound the basic metrics like avg, max, etc.

Cheers,
Tim


(Steve Earl) #6

Hi Tim,

Thanks for the quick and comprehensive set of suggestions in your response (and for the appreciation of my early artwork - sadly its all been downhill from there :smiley:).

I'll work through your mail, give things a try and see what gives me the best looking solution. Perhaps something based on points (rather than line) but showing the max/percentile might be the best approach. The line chart which shows what appears to be a 'broken line' may just cause more confusion with my users as to what they're looking at...!

Just out of interest, is there a plan to add Percentile to Timelion do you know?

Regards (& thanks again),
Steve


(Tim Roes) #7

The open issue for implementing percentiles in timelion is #8953, though I don't think anyone is actively working on this at the moment.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.