Hello!
I'm working on a plugin to add some cheminformatics features to our Elasticsearch cluster. Several of these involve steps where there is an inverted index based screening stage (which fits well as a Lucene query), followed by a more expensive algorithm to eliminate false positives.
The nature of these algorithms allows for utilization of the setMinCompetitiveScore
API to skip these expensive steps when sorting by score, which allows us to keep the query tractable even for non-selective queries. However, we would also like to be able to sort by other fields, which creates a challenge out of the box due to the way Lucene sorting works - it requires collecting all of the hits to sort without potentially losing hits.
One potential solution I'm considering is creating a custom doc collector and moving the check for a false positive there, so that the psuedocode would look something like this:
get next doc id from screening step
check sort value of doc against priority queue; if it won't make it into the top K hits, skip it
else run the expensive false positive check and collect result if it passes
However, I'm not sure if there is a place to inject this doc collector via a plugin - it's not obvious to me if the plugin interface supports that. Any guidance on how I can accomplish this? As a last resort, I was considering creating a custom action plugin that exposes our own search API, but I'd rather not have to keep that up to date with the evolution of changes in the main search API, since we'll be composing this with other search capabilities.