Custom document scoring with a secondary 'penalization' query

I recently heard about a rather unique method that a company is using to rank documents from a query. First, the queries that this company uses control the scoring down to a T by replacing the document scores with the individual scores returned by a function_score query. Each function in the function_score query queries against an individual field. Each field has a predefined weight that is used to replace the document score that Elasticsearch assigns to it. These resulting scores from each function are then summed.

That's all fine and dandy. However, the piece that I don't understand is a second query that is performed after the initial query. This second query acts as a 'penalization' query and queries the resulting documents from the initial query for fields that don't match the desired data. The more fields don't match the desired data, the higher the score. Then, outside Elasticsearch, the company takes the results from the two queries and subtracts the second query's score from the first query's score. It then filters out any documents that don't match a certain minimum score.

So, given what I know about Elasticsearch, I am completely convinced that this second 'penalization' query is redundant and any scoring differences that result from subtracting one score from another can be merged into one query by tweaking the function_score weights and whatnot. However, I have no way to formally prove this. Am I right in assuming that the second 'penalization' query is redundant?

In addition, if summing function_score query functions and using that to replace the score for each document is not the best approach, what would you all recommend?

Hi @ChapterSevenSeeds,

Welcome to the community! Did you come across this approach in a blog or resource that you can share?

Hi @carly.richmond!

I did not come across this approach anywhere in a publicly available resource. I just stumbled across it when I did some work for the company I mentioned above.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.