I'm trying to get started in using Groovy scripts, and I'm having some trouble understanding scope/limitations of the Groovy "API." At a high level, I'm trying to build queries that return what could be called "multi-score" queries (e.g., "document 1 scored 80% for relevance to your query and 90% for popularity/quality, giving a combined score of 85%" where all three numbers appear in the search results). To do this, I've been experimenting with computing the component scores using script_fields, and an overall score using script_score.
Here's the main issues I've been running into thus far:
Is there any way to access the inner_hits data from a nested query from within Groovy script?
Is there any way to access script fields within a script_score script?
I've been doing more experiments, and have found a few things out:
I'd assumed that script_fields code was executed before the script_score because _score is not available as a variable inside the script_fields code. This turned out to be false: script_fields is executed on only the hits that are returned, and is executed in serial, e.g., adding sleep(500); to script_fields will result in 25000ms of additional delay when there are 50 hits. Naturally, this means that it will be impossible to use script fields in score calculations.
script_scoreis parallelized, e.g., on a 350k-document 5-shard index, adding sleep(1); to script_score resulted in a roughly 87s delay, not a 350s delay. -- I'm not sure why I got ~4x speedup rather than a 5x one.
I also experimented with using python instead of groovy, which allowed me to poke at the available locals (in script_fields anyways). There don't seem to be any variables related to inner hits. The list of variables is ['_CACHE', '_FREQUENCIES', '_OFFSETS', '_PAYLOADS', '_POSITIONS', '_doc', '_fields', '_index', '_source', 'doc'], and in script_score there's the added variable _score.
I skimmed a bit of the python-script extension's source code, and couldn't see any sign that it was selectively including some variables, but not others. Thus, I'm inclined to believe that inner hits are not accessible from within a groovy script.
It might be possible to fork and modify ES to support this feature (I haven't yet thought of a reason why it can't be done, anyways), but that's too big a side-project for my current team. In the end, it was easiest to just recompute the inner hits from doc inside of script_fields. Not sure whether that's much help for you project or not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.