Hi Steve,
Regarding interrupted charts
It's just not as easy as it looks. Because what would be the correct approach to do in this situation? We could pull the line down to zero for time slots without data. But that could lead to false conclusions, because how could they differentiate them now from actual buckets with a value of 0. So another approach would be, just skip those buckets when it comes to line drawing and connect it directly to the next existing point (i.e. interpolating the data over empty buckets), that would also be confusing, since it suggests the data between two points would be maybe existing. Depending on what you graph that could make sense or could lead to totally wrong conclusions.
So basically, without knowing what the domain of the data is, there is no better way to draw this without hiding facts about the data (missing buckets). Thus the best solution would be, to let that null-bucket behavior be chosen by the user. In Timelion we can do this already with the fit
parameter. In TSVB there is an long issue about this with some discussion in #11793 and for the classical visualizations there is no way at that moment. So if you are handling a lot of that data I would recommend using timelion for now. (By the way, kudos for your awsome statistical drawing skills as a child )
Regarding the timelion question
I would totally agree with you, that 30 lines are not really detectable anymore, and that also one chart per operation might be too much (even though I would consider it the better of those two approaches).
If each of those operations has a different SLA, it might be hard to visualize this within one graph, since what might be an outlier for one operation, might be within the average range of other operations, and thus just perish in the average of all the other APIs.
As a suggestion: if you know the concrete values for your SLA, you could highlight only the values, that are above that value, to make them easily detectable even in a graph with 10 different operations.
That could look as follows, with a line by line description what it does:
.es(opname:operation1, metric=avg:elapsed_time) // filter only for the data from that operation
.if(gt, 15, null) // If this data is greater than 15 (your SLA) remove it from the graph
.points(0.2).color(tomato), // draw these values that are within your SLA, very small (radius 0.2)
.es(opname:operation1, metric=avg:elapsed_time) // filter again to draw only outliers
.if(lte, 15, null) // Remove all the data within the SLA (that is already drawn above from the graph)
.points(fill=10, fillColor=tomato).color(tomato) // draw very large bullets for these outliers
You can now use that two expressions and repeat them for all of your ten operations, and you would get a graph, that shows outliers in a different size. The following example shows the same, using request logs and countries with different thresholds:
You can also draw a threshold line for each API with .static(15).line(width=1).color(tomato)
.
If you would want to draw average and max to the graph, you could also use multiple metric
parameters to the .es
function. Unfortunately timelion doesn't support percentile aggregation yet, so you are bound the basic metrics like avg
, max
, etc.
Cheers,
Tim