How do I access term vector in painless scripting?

Noppanit_Charassinvi · May 4, 2017, 2:35pm

We're trying to replicate this ES plugin https://github.com/MLnick/elasticsearch-vector-scoring. The reason is AWS ES doesn't allow any custom plugin to be installed. The plugin is just doing dot product and cosine similarity so I'm guessing it should be really simple to replicate that in painless script. It looks like groovy scripting is deprecated in 5.0 and disabled in AWS.

Here's the source code of the plugin.

    /**
     * @param params index that a scored are placed in this parameter. Initialize them here.
     */
    @SuppressWarnings("unchecked")
    private PayloadVectorScoreScript(Map<String, Object> params) {
        params.entrySet();
        // get field to score
        field = (String) params.get("field");
        // get query vector
        vector = (List<Double>) params.get("vector");
        // cosine flag
        Object cosineParam = params.get("cosine");
        if (cosineParam != null) {
            cosine = (boolean) cosineParam;
        }
        if (field == null || vector == null) {
            throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": field or vector parameter missing!");
        }
        // init index
        index = new ArrayList<>(vector.size());
        for (int i = 0; i < vector.size(); i++) {
            index.add(String.valueOf(i));
        }
        if (vector.size() != index.size()) {
            throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": index and vector array must have same length!");
        }
        if (cosine) {
            // compute query vector norm once
            for (double v: vector) {
                queryVectorNorm += Math.pow(v, 2.0);
            }
        }
    }

    @Override
    public Object run() {
        float score = 0;
        // first, get the ShardTerms object for the field.
        IndexField indexField = this.indexLookup().get(field);
        double docVectorNorm = 0.0f;
        for (int i = 0; i < index.size(); i++) {
            // get the vector value stored in the term payload
            IndexFieldTerm indexTermField = indexField.get(index.get(i), IndexLookup.FLAG_PAYLOADS);
            float payload = 0f;
            if (indexTermField != null) {
                Iterator<TermPosition> iter = indexTermField.iterator();
                if (iter.hasNext()) {
                    payload = iter.next().payloadAsFloat(0f);
                    if (cosine) {
                        // doc vector norm
                        docVectorNorm += Math.pow(payload, 2.0);
                    }
                }
            }
            // dot product
            score += payload * vector.get(i);
        }
        if (cosine) {
            // cosine similarity score
            if (docVectorNorm == 0 || queryVectorNorm == 0) return 0f;
            return score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
        } else {
            // dot product score
            return score;
        }
    }

I'm trying to start with just getting a field from index. But I'm getting error.

Here's the shape of my index.

I've enabled delimited_payload_filter

"settings" : {
    "analysis": {
            "analyzer": {
               "payload_analyzer": {
                  "type": "custom",
                  "tokenizer":"whitespace",
                  "filter":"delimited_payload_filter"
                }
      }
    }
 }

And I have a field called @model_factor to store a vector.

{
    "movies" : {
        "properties" : {
            "@model_factor": {
                            "type": "text",
                            "term_vector": "with_positions_offsets_payloads",
                            "analyzer" : "payload_analyzer"
                     }
        }
    }
}

And this is the shape of the document

{
    "@model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
    "name": "Test 1"
}

Here's how I use the script

{
    "query": {
        "function_score": {
            "query" : {
                "query_string": {
                    "query": "*"
                }
            },
            "script_score": {
                "script": {
        			"inline": "def termInfo = doc['_index']['@model_factor'].get('1', 4);",
                	"lang": "painless",
                	"params": {
                    	"field": "@model_factor",
                    	"vector": [0.1,2.3,-1.6,0.7,-1.3],
                    	"cosine" : true
                    }
				}
            },
            "boost_mode": "replace"
        }
    }
}

And this is the error I got.

"failures": [
      {
        "shard": 2,
        "index": "test",
        "node": "ShL2G7B_Q_CMII5OvuFJNQ",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "caused_by": {
            "type": "wrong_method_type_exception",
            "reason": "wrong_method_type_exception: cannot convert MethodHandle(List,int)int to (Object,String)String"
          },
          "script_stack": [
            "termInfo = doc['_index']['@model_factor'].get('1',4);",
            "              ^---- HERE"
          ],
          "script": "def termInfo = doc['_index']['@model_factor'].get('1',4);",
          "lang": "painless"
        }
      }
    ]

The question is how do I access the index field to get @model_factor in painless scripting?

system · June 1, 2017, 2:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to access term_vectors (payloads) from painless script Elasticsearch	1	266	June 17, 2021
Painless script don't have access to a _index variable and groovy is depreciated in 5.4 Elasticsearch	2	1020	June 20, 2017
Accessing array of object in painless script , search query [ES 6.4] Elasticsearch	4	1777	December 30, 2019
Allow vector functions in script fields context Elasticsearch	4	1346	October 19, 2020
Most efficient way of accessing long[][] data in Painless scripts Elasticsearch painless	2	208	July 18, 2023

How do I access term vector in painless scripting?

Related topics