Rank Eval Expect No Results

I'm really enjoying the Rank Eval API. But I have a use case for one of my indices where I would like to test queries where there shouldn't be any hits in the results.

Is there an existing EvaluationMetric that handles this case?

This is an interesting use case. I don't think the "classic" metrics are appropriate for this. How many of these "no-match" queries do you need to test? Do they have to be mixed in with other queries that do have results? If so, why? If not, can you create a dedicated test set and maybe re-use one of the existing metrics but check that they are not exceeding a certain value? Haven't thought this through completely but happy to hear back from you.

1 Like

As of now I have a relatively low number of these tests (~15), but I see that number easily growing to the 100s of tests in the coming months. They don't have to be mixed in with the other tests.

My fall back is to separate the tests in the groups of files that qualify for existing EvaluationMetrics and this No Hits case. Then for the No Hits case use the PrecisionAtK EvaluationMetric (k=1, ignore_unlabeled=false) then parse the JSON response to and evaluate the score for each test based on the irrelevant (aka unlabeled) docs that are returned. So if no docs are return score=1 else score=0. Then aggregate scores as needed.

This is a fine back up since I'm not running tens of thousands of tests, but it would be nice to handle this case in the RankEval module.

I spent some time trying to write a custom plugin that would add this metric to the API, but I couldn't seem to get it to work. I'm not a Java developer so I'm not 100% sure if this is even extensible, but kept getting held up with the X Content Parser not recognizing the new Parser I was attempting to add and pass in the metrics object.

type": "named_object_not_found_exception",
"reason": "[101:5] unable to parse EvaluationMetric with name [no_hits_metric]: parser not found"

The goal was to make metrics extensible indeed, but I have to be honest that I don't know about any existing plugins using this. It would be interesting to get a look at whats working and whats not. Do you have your code on Github or somewhere else online where I could take a look at it? Unfortunately I cannot commit to any timeframe but it would be interesting to see.
Great to hear that separating the no-hits cases to a separate test set seems to be a good workaround for now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.