Understanding Joint Probability of Less Likely Samples :: calculate method

I bought great Collier's book, I've read the Prelert paper about APM monitoring, and then I've been trough the ML-CPP code at github...

A tiny sense of curiosity sparkled inside me about the method:


is it correct to say that this method answer the (eg) question "If I have 3 samples with value 0,4; 0,2 and 0,3; what's the probability of seeing a set of n samples with a lower joint probability than that of this joint 3 samples?"
Am I getting this the right way?

I'm confused because the github code states in its description the following:

We want to find the probability of seeing a more extreme collection
of independent samples {y(1), y(2), ... , y(n)} than a specified (1)
collection {x(1), x(2), ... , x(n)}, where this is defined as the
probability of the set R:
{ {y(i)} : Product_j{ L(y(j)) } <= Product_j{ L(y(j)) } }. (2)

We will assume that y(i) ~ N(m(i), v(i)). In this case, it is possible
to show that:
P(R) = gi(n/2, Sum_i{ z(i)^2 } / 2) / g(n/2) (3)

z(i) = (x(i) - m(i)) / v(i) ^ (1/2).
gi(., .) is the upper incomplete gamma function.
g(.) is the gamma function.

(1) Looking at the unit test, it seems this probability is then multiplied by the sample size (n==20000) in order to get the expected count of samples with lower probability than the (n==3) set in the case tested, but according to description it should be n==3, as {y(1), y(2), y(3)}... what am I getting wrong in this?

(2) Left and Right hands seems equal to me, is that ok?

(3) Is it possible to get some info about the theorem that probes this?

I would be very happy if you could lend me a hand with this stat choke that's cranking up in my brain...

This is a typo in the github comment: it should read { y(i) : \prod_{i}{ L(y(i)) } <= \prod_{i}{ L(x(i)) } }, i.e. the set of all points for which the joint likelihood is less than the joint likelihood of the values {x(i)} just observed. If the likelihoods are normal then this is just the integral of the multivariate normal distribution outside the ellipsoid, i.e. for which (y - m)^t \Sigma^{-1} (y - m) >= (x - m)^t \Sigma^{-1} (x - m). (In fact, even if they are non-normal, but the tails aren't too heavy, this often gives something remarkably close to the true value as per the definition.) The fact that this is expressible as an incomplete gamma function is a standard result, for example see here and particularly this reference. The equations in that reference aren't numbered so I can't link the exact one, but the relevant one is towards the end, i.e. F(r) = \int_0^{r^2} x^{(n-2)/2}e^{-x/2}/(2^{n/2}\Gamma(n/2)) dx and compare with the definition of the gamma function. To get a sense of what this is doing, have a look at this test which compares this to the value calculated by sampling.


Tom & Rich, many thanks for the fast, kind & comprehensive reply! My stat choke on my brain passed all the way out... :stuck_out_tongue:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.