I'm trying to measure the effectiveness of query changes.
I am collecting data about what searches users are making and what was their target document.
To that end, I want to play back the searches later to find out whether a query modification helps or hurts the relevancy of the search results.
Is there a way to select the rank of a document in a specific search? That is, I want to perform a search and get the position of a specific document in the search results.
Is there an easy way to do this? Previously I paged through all results until I found the document in question, this was slow but feasible since there are not that many documents.
Yet you need not really solve that problem, as there's better ways to evaluate the relevance of search results:
In classical information retrieval, this sort of testing is done based on top N results using what are known as judgement lists. Judgement lists are graded (or automatically captured) evaluations of individual results for specific queries.
For example, we have a product, Quepid that helps you evaluate your relevance. What we do is allow users to grade relevance results from 1-10 (10 best possible result, 1 terrible). We know the top 10 ideal, best results. We also know the 10 results you're getting right now. You can use any number of formulas, such as various forms of discounted cumalitive gain to evaluate the set of current results as compared to that queries ideal results. In Quepid, we have been chewing on this, and still haven't perfected it. But what we do is simply take an average of the 1-10 scores of the current results and punish based on the edit distance from the best
possible results.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.