I recently heard about a rather unique method that a company is using to rank documents from a query. First, the queries that this company uses control the scoring down to a T by replacing the document scores with the individual scores returned by a function_score
query. Each function in the function_score
query queries against an individual field. Each field has a predefined weight that is used to replace the document score that Elasticsearch assigns to it. These resulting scores from each function are then summed.
That's all fine and dandy. However, the piece that I don't understand is a second query that is performed after the initial query. This second query acts as a 'penalization' query and queries the resulting documents from the initial query for fields that don't match the desired data. The more fields don't match the desired data, the higher the score. Then, outside Elasticsearch, the company takes the results from the two queries and subtracts the second query's score from the first query's score. It then filters out any documents that don't match a certain minimum score.
So, given what I know about Elasticsearch, I am completely convinced that this second 'penalization' query is redundant and any scoring differences that result from subtracting one score from another can be merged into one query by tweaking the function_score
weights and whatnot. However, I have no way to formally prove this. Am I right in assuming that the second 'penalization' query is redundant?
In addition, if summing function_score
query functions and using that to replace the score for each document is not the best approach, what would you all recommend?