Best way to visualize Canary / AB comparison deployment

I've been trying to figure out the best way to visualize differences (besides just eyeing them) in a typical AB test. The typical setup:

  • a list of servers pairs gets selected, each within a given site. Each server gets a canary flag, "1" for A and "0" for B (control).
  • each has a bunch of metrics (rps, network, cpu, etc). Some of them are gauge values (current state), others are increasing counters (needs diff from previous).

Documents like:
{
server: "c1.nflxvideo.com",
site: "sjc005",
canary: "1",
requests: 15600,
cpu: 87,
...
}

It is possible to compute the global AB diff for a metric by doing in VB:

  • group by server
  • Avg of metric (gauge) or Max+Derivative (counter)
  • side metric on "canary factor" which is a 1/-1 factor based on canary (would be great to have this as scripted field, but VB doesnt support them)
  • Calculation agg that multiplies metric by canary factor
  • Series agg with a Sum of all lines

But to be useful, the graph would be split by pairs, so that one can quickly see which pair is deviating.
Don't see a way to do this today.

Timelion may be able to do this thanks to this feature:

.es(select A servers with split on site).doSomeCalc().sub(.es(select B servers with split on site).mul(-1).doSomeCalc())

Any help appreciated, this would be super handy, AB tests is a core way we deploy things.

Hm. I'm not sure I completely follow your setup.

You've got some data from some A/B tests you ran. Presumably, you want to evaluate the differences between the results coming in from A and B. I think I'd do this as a dashboard with separate visualizations for "A" and "B". e.g. an A gauge and a B gauge for data that fits the gauge visualization, and an A table and a B table for tabular data, etc. Then, I'd place those side-by-side to get some idea of how the two tests compare.

Does that make sense?

I find that things get hairy when you have too many interconnected, nested aggs, etc in Kibana visualizations, so I'd try to think of a simpler compromise.

This would be too hard for the eyes to compare. The main thing in AB is that each pair may have very different patterns compared to other pairs and there can be dozens of pairs.
Getting each pair's diff on the same graph would be key, so that you can quickly tell which one deviates either on the positive or negative side. I would save the operator hours at a time, since we have a lot of metrics.
Again doing the group by "pair name" instead of server, would not let me do the derivative and calculation (multiply by -1) per server.
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.