I have an use case where I am comparing input titles to a corpus of titles to determine whether I have seen this title before. Using the default scoring match on the title field works great for ranking, but I also want to extract a comparison of the input title and the suggested title.
My thinking is that I can add a script_fields to compute a TF difference as follows, but I'm in the weeds on this one. I'm not sure how to start. Can I script this as a field?
get terms from corpus title "corpus_terms"
get terms from input title "input_terms"
let all_terms = union(corpus_terms , input_terms) let total = 0 let total_pcts = 0 for term in all_terms: let weight = 0.0 if term in input_terms: if term in corpus_terms: weight = 1.0 else: weight = 0.0 else: # Slight penalty for terms in corpus title not appearing in input_terms weight = -0.001 let pct_of_titles_with_term = index[term].doc_count() / index.doc_count() total_pcts += pct_of_titles_with_term total += (pct_of_titles_with_term * weight) let difference_from_0_to_1 = total / total_pcts return difference_from_0_to_1