Getting all terms/tokens for a field in a custom scoring script (in java)


(Luca Weihs) #1

I have a similar question as to the one asked here but as that one has had no responses and is over 6 months old I hope it is ok to ask again.

I would like to implement a custom scoring function (as a native script in Java) that uses, as inputs, a collection of common learning to rank features. Given my understanding of custom scoring all of the logic for this computation is placed inside a function in a class extending AbstractSearchScript. In this context an IndexLookup object is accessible (allowing access to TF/IDF information for particular fields via its 'get' function which returns a IndexField object).

Computing the features I would like requires that I know all of the terms/tokens for particular fields so that I can use the 'get' function of the IndexField class to retrieve TF/IDF information about the terms (in particular the summary functions sumdf and sumttf of the IndexField class are too coarse for me). Unfortunately I don't know what these terms are in general, as they are different for every query, and would have to retrieve them dynamically. Is this possible while in the custom scoring context? If not is there a better way to accomplish the above?

Thanks!


(system) #2