JLH score for significant terms


(Chris) #1

When using the JLH score to rank significance, how would the absolute change in popularity favor common terms whereas the relative change would favor rare terms as stated in the documentation?

I ran through several concrete exercises and I cannot see how either absolute or relative changes in percentages favor rare over common.

Is there a simple concrete example that can illustrate this?


(Doug Turnbull) #2

I laid out some concrete examples in my recommendations blog article, which is one area where JLH really shines.

To summarize Mark Harwood, the creator of JLH, he says that "fleas jump higher than elephants." Basically what JLH attempts to do is detect magnitude of change between the global collection to the local search results. Large things are expected to not grow as much. As Mark says, it's unlikely Microsoft grows 100 fold. Little things are expected to change a lot -- startups grow 100 fold.

There's a lot of considerations too, you don't want to bias scoring towards extremely rare not statistically confident events. A single co-occurence could just be a one-off, as I write about here. So its good to set a min doc frequency for JLH.


(Chris) #3

That Rocky IV and Good Will Hunting example is perfect. The 7% foreground and 4% background versus the 35% foreground and 20% background is a good example of how the absolute difference adds weight to the more common occurrences.

Thanks.


(system) #4