Passing document field value as parameter for advanced script

maarab · April 1, 2018, 11:29am

In elasticsearch, I have implemented a custom script using java and am using it in the query. The java code (created plugin for elasticsearch) is a method that will accept two string and return a value. One is a hard-coded string, the other should be the current document's field value. I'm not able to pass the document field value as a parameter. Maybe I'm doing it wrong and there is an easier way of doing it. The query should be something like:

{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
                "source": "string_compare",
                "lang" : "expert_scripts",
                "params": {
                    "inputValue": "The Star Wars",
                    "docValue": <current documents value of movieName field>
                }
            }
          }
        }
      ]
    }
  }
}

"current documents value of movieName field". This is what is required to be done such that 'The Star Wars' string is compared with every field value of movieName. If this is replaced with a hard-coded value, it works.
Also if the field can directly be accessed in the java code instead of passing as parameter, that will do as well.

rjernst · April 6, 2018, 4:01pm

params is for passing constant values for the execution of a single query. To access doc values, you need to interact with Lucene, within your script. Have you looked at how the advanced script example docs do this?

maarab · April 8, 2018, 7:21am

Thank you for replying, I did look at the advanced script example docs but don't seem to understand how to access field value per document. My initial understanding was that the native script written should execute for every document like a function would for every row in a sql query. If the query is returning 3 documents (doc:1,2,3), then for comparing the constant value 'The Star Wars', script should run as:
doc:1, movieName:"Star Wars" (script runs to compare ('The Star Wars','Star Wars'))
doc:2, movieName:"Starr Warz" (script run to compare ('The Star Wars','Starr Warz'))
doc:3, movieName:"The Star Wars" (script run to compare ('The Star Wars','The Star Wars'))

I have two questions with respect to this:

Will the script run per query or per document in the query's result? i.e. 1 time vs 3 times as for the above?
If it runs per document, how can I access the document field value? 'The Star Wars' input value can be passed as params and the document field value can be accessed within the plugin.

rjernst · April 13, 2018, 6:01pm

Will the script run per query or per document in the query's result? i.e. 1 time vs 3 times as for the above?

While this depends on the type of script run, in your case (a scoring script) it will be run for every document matching the query.

If it runs per document, how can I access the document field value? 'The Star Wars' input value can be passed as params and the document field value can be accessed within the plugin.

While this is not in the advanced scoring script example I referenced in my last reply, I will use that as a basis for how to access this data. Within the newInstance method, you would add a line like:

SortedSetDocValues dv = DocValues.getSortedSet(context.reader(), "movieName");

Then inside setDocument(int docid) of the created SearchScript, you would advance the doc values iterator:

dv.advanceExact(docid);

And finally, within the runAsDouble() method, you can access the value of the field:

int valueOrdinal = dv.nextOrd();
BytesRef valueBytes = dv.lookupOrd(valudOrdinal);
String movieName = valueBytes.utf8ToString();

There are a couple things to note in the last code snippet:

nextOrd() should only be called if advanceExact returned true. My example assumes every document has a value for the "movieName" field, and thus does not check it, but it would be good to assert or verify this.
It would be more efficient to use the raw BytesRef if possible. For example, if you are doing a comparison to another static string, you can construct a BytesRef for that string and call compareTo, rather than encoding the bytes of every movie name into a utf8 string on every query.

maarab · April 19, 2018, 6:31am

Thank you for the reply. I also found the following plugin example by @nik9000:

github.com

elastic/elasticsearch/blob/6.2/plugins/examples/rescore/src/main/java/org/elasticsearch/example/rescore/ExampleRescoreBuilder.java

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

package org.elasticsearch.example.rescore;

This file has been truncated. show original

After using it to my requirement, I have the following:
Query:

        GET movie-idx/_search?
        {
          "query": {
            "bool": {
              "must": [
                {
                  "query_string": {
                    "fields": [
                      "MovieName"
                    ],
                    "query": "Star Wars",
                    "minimum_should_match": "61%",
                    "fuzziness": 1,
                    "_name": "fuzzy"
                  }
                }
              ]
            }
          },
          "rescore": {
            "calculateMovieScore": {
              "MovieName": "Star Wars"
            }
          }
        }

And my rescorer class looks like:

private static class DocsRescorer implements Rescorer {
        private static final DocsRescorer INSTANCE = new DocsRescorer();

        @Override
        public TopDocs rescore(TopDocs topDocs, IndexSearcher searcher, RescoreContext rescoreContext) throws IOException {
            DocRescoreContext context = (DocRescoreContext) rescoreContext;
            int end = Math.min(topDocs.scoreDocs.length, rescoreContext.getWindowSize());

            MovieScorer MovieScorer = new MovieScorerBuilder()
                    .withInputName(context.MovieName)
                    .build();

            for (int i = 0; i < end; i++) {
                String name = <get MovieName values from actual document returned by topdocs>
                float score = MovieScorer.calculateScore(name);
                topDocs.scoreDocs[i].score = score;
            }

            List<ScoreDoc> scoreDocList =  Stream.of(topDocs.scoreDocs).filter((a) -> a.score >= context.threshold).sorted(
                    (a, b) -> {
                        if (a.score > b.score) {
                            return -1;
                        }
                        if (a.score < b.score) {
                            return 1;
                        }
                        // Safe because doc ids >= 0
                        return a.doc - b.doc;
                    }
            ).collect(Collectors.toList());
            ScoreDoc[] scoreDocs = scoreDocList.toArray(new ScoreDoc[scoreDocList.size()]);
            topDocs.scoreDocs = scoreDocs;
            return topDocs;
        }

        @Override
        public Explanation explain(int topLevelDocId, IndexSearcher searcher, RescoreContext rescoreContext,
                                   Explanation sourceExplanation) throws IOException {
            DocRescoreContext context = (DocRescoreContext) rescoreContext;
            // Note that this is inaccurate because it ignores factor field
            return Explanation.match(context.factor, "test", singletonList(sourceExplanation));
        }

        @Override
        public void extractTerms(IndexSearcher searcher, RescoreContext rescoreContext, Set<Term> termsSet) {
            // Since we don't use queries there are no terms to extract.
        }
    }

My understanding is that the plugin code will execute once, it will get topdocs as results from the initial query (the fuzzy search in this case) and for (int i = 0; i < end; i++) will loop through each document returned in the result. The place where I need help is:
String name = get MovieName value from actual document returned by topdocs

system · May 17, 2018, 6:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Passing parameters to script dynamically in Elastic Search Elasticsearch	1	763	July 10, 2019
Advanced score ScriptPlugin Elasticsearch	3	512	November 22, 2019
Use another document's value for script_score computation Elasticsearch painless	1	264	September 2, 2021
Incrementing param value within a script Elasticsearch painless	3	329	March 19, 2021
How to access document fields from script? Elasticsearch	2	351	July 6, 2017

Passing document field value as parameter for advanced script

Related topics