I bought great Collier's book, I've read the Prelert paper about APM monitoring, and then I've been trough the ML-CPP code at github...
A tiny sense of curiosity sparkled inside me about the method:
ProbabilityAggregators::CJointProbabilityOfLessLikelySamples::calculate
is it correct to say that this method answer the (eg) question "If I have 3 samples with value 0,4; 0,2 and 0,3; what's the probability of seeing a set of n samples with a lower joint probability than that of this joint 3 samples?"
Am I getting this the right way?
I'm confused because the github code states in its description the following:
We want to find the probability of seeing a more extreme collection
of independent samples {y(1), y(2), ... , y(n)} than a specified (1)
collection {x(1), x(2), ... , x(n)}, where this is defined as the
probability of the set R:
{ {y(i)} : Product_j{ L(y(j)) } <= Product_j{ L(y(j)) } }. (2)We will assume that y(i) ~ N(m(i), v(i)). In this case, it is possible
to show that:
P(R) = gi(n/2, Sum_i{ z(i)^2 } / 2) / g(n/2) (3)where,
z(i) = (x(i) - m(i)) / v(i) ^ (1/2).
gi(., .) is the upper incomplete gamma function.
g(.) is the gamma function.
(1) Looking at the unit test, it seems this probability is then multiplied by the sample size (n==20000) in order to get the expected count of samples with lower probability than the (n==3) set in the case tested, but according to description it should be n==3, as {y(1), y(2), y(3)}... what am I getting wrong in this?
(2) Left and Right hands seems equal to me, is that ok?
(3) Is it possible to get some info about the theorem that probes this?
I would be very happy if you could lend me a hand with this stat choke that's cranking up in my brain...