JLH score for significant terms

ChrisR · September 30, 2016, 2:44pm

When using the JLH score to rank significance, how would the absolute change in popularity favor common terms whereas the relative change would favor rare terms as stated in the documentation?

I ran through several concrete exercises and I cannot see how either absolute or relative changes in percentages favor rare over common.

Is there a simple concrete example that can illustrate this?

softwaredoug · September 30, 2016, 2:56pm

I laid out some concrete examples in my recommendations blog article, which is one area where JLH really shines.

To summarize Mark Harwood, the creator of JLH, he says that "fleas jump higher than elephants." Basically what JLH attempts to do is detect magnitude of change between the global collection to the local search results. Large things are expected to not grow as much. As Mark says, it's unlikely Microsoft grows 100 fold. Little things are expected to change a lot -- startups grow 100 fold.

There's a lot of considerations too, you don't want to bias scoring towards extremely rare not statistically confident events. A single co-occurence could just be a one-off, as I write about here. So its good to set a min doc frequency for JLH.

ChrisR · September 30, 2016, 4:03pm

That Rocky IV and Good Will Hunting example is perfect. The 7% foreground and 4% background versus the 35% foreground and 20% background is a good example of how the absolute difference adds weight to the more common occurrences.

Thanks.

Topic		Replies	Views
How is the score of Significant Term aggregation calculated? Elasticsearch	7	625	September 12, 2018
JLH score calculation Elasticsearch	4	2668	February 19, 2018
How to have an attribute value to really make a difference in score? (help with rank_feature needed) Elasticsearch	1	424	January 13, 2021
How to get certain documents to rank higher than others in a general sense Elasticsearch	2	353	July 6, 2017
Question on sub-query scoring Elasticsearch	1	755	May 22, 2018

JLH score for significant terms

Related topics