I am trying to implement a customized search function that will rank documents based on sparse features. Concretely, each document will have a list of sparse features, e.g.
doc_1 -> {a: 1.1, b: 0.2}
doc_2 -> {a: 0.2, z: 0.3, zz: 1.2} ...
Now at the query stage, the query will have a list of the sparse dimension that appears, e.g. [a, zz, b]. My scoring function is for each doc is simply added up all the value of terms that appear in both the query and the document. Take the above example,
doc_1_score = 0.2 + 1.2
doc_2_score = 1.1 + 0.2
My question is what is the most efficient way to implement this, I have two ideas now.
- using rank features to save all the values in documents, and then using a list of "should" to get the score
- save the features as sparse vectors, then using a dot_product to get the final score.
Which one do you think will be more efficient (memory & speed). Is there better way to accomplish this, e.g. using inverted index? Thank you!