I'm new to Elasticsearch. What is the best practice regarding rescoring? We need to do some complex searches including edit distances. From a performance perspective is rescoring the best practice or does it come with its own set of issues that lead people from utilizing it?
Rescoring is useful to limit the scope of an expensive operation. Like if you find that proximity searches are pricy compared to your normal searches you can limit running them to the top 500 results per shard or something.
One of the problems with rescore is that it expands the "window" of the search to always be as big as the rescore window. Normally when you search with
size=10&from=0 (the defaults) Elasticsearch only collects 10 hits per shard. If you have a rescore on the top 10,000 documents Elasticsearch has to collect the top 10,000 documents and then apply the rescore, shuffling the documents into the right order. "collecting documents" means putting them in a min-heap. So that means that the query has an
O(hits * ln(num_collected)) component.