Plugin to get only the hits with the max-score


(Jean Helou) #1

Hello,

Clinton gormley pointed me here after closing my feature request https://github.com/elastic/elasticsearch/issues/12413 and suggested it could be implemented trough a plugin.

Can someone give me pointers on how to create a plugin which would plug in the query engine so that only the results with the max-score ?

I don't know in advance how many results there will be for a given query so size is useless. I am only interested in getting the results which have the max-score (as in SearchHits.getMaxScore)

Ideally I would be able to add a property to the search request like so

GET index/type/_search
{
 "topScorers": true,
 "query": {"match": {
   "name": "the quick brown fox"
 }} 
} 
# OR
GET index/type/_search
{
 "topScorers": 3,
 "query": {"match": {
   "name": "the quick brown fox"
 }} 
} 

I tried looking at the ES codebase and at a few existing plugins. As far as I can tell plugins are supposed to provide custom guice modules.
So I checked the modules :
it looks like I would have to create a custom implementation for DefaultSearchContext, FilteredSearchContext and PercolateContext to make the top-scorer flag available in the various contexts.

Then I would need a TopScorerQueryPhase which inherits from QueryPhases, overrides its execute method calling super then manipulating the topdocs in the queryResult to limit the score docs to only the docs with the max score.
I want to influence the topdocs in the query phase to avoid fetching the source and highlighting for results which will not be returned aynway.

How do I proceed from here ?
initializing a maven project for the plugin is not an issue and is well documented, how to actually hook into ES code from the plugin and change behavior for the query phase of the _search endpoint is not si obvious ...

Thanks


(system) #2