How do I access term vector in painless scripting?


(Noppanit Charassinvichai) #1

We're trying to replicate this ES plugin https://github.com/MLnick/elasticsearch-vector-scoring. The reason is AWS ES doesn't allow any custom plugin to be installed. The plugin is just doing dot product and cosine similarity so I'm guessing it should be really simple to replicate that in painless script. It looks like groovy scripting is deprecated in 5.0 and disabled in AWS.

Here's the source code of the plugin.

    /**
     * @param params index that a scored are placed in this parameter. Initialize them here.
     */
    @SuppressWarnings("unchecked")
    private PayloadVectorScoreScript(Map<String, Object> params) {
        params.entrySet();
        // get field to score
        field = (String) params.get("field");
        // get query vector
        vector = (List<Double>) params.get("vector");
        // cosine flag
        Object cosineParam = params.get("cosine");
        if (cosineParam != null) {
            cosine = (boolean) cosineParam;
        }
        if (field == null || vector == null) {
            throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": field or vector parameter missing!");
        }
        // init index
        index = new ArrayList<>(vector.size());
        for (int i = 0; i < vector.size(); i++) {
            index.add(String.valueOf(i));
        }
        if (vector.size() != index.size()) {
            throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": index and vector array must have same length!");
        }
        if (cosine) {
            // compute query vector norm once
            for (double v: vector) {
                queryVectorNorm += Math.pow(v, 2.0);
            }
        }
    }

    @Override
    public Object run() {
        float score = 0;
        // first, get the ShardTerms object for the field.
        IndexField indexField = this.indexLookup().get(field);
        double docVectorNorm = 0.0f;
        for (int i = 0; i < index.size(); i++) {
            // get the vector value stored in the term payload
            IndexFieldTerm indexTermField = indexField.get(index.get(i), IndexLookup.FLAG_PAYLOADS);
            float payload = 0f;
            if (indexTermField != null) {
                Iterator<TermPosition> iter = indexTermField.iterator();
                if (iter.hasNext()) {
                    payload = iter.next().payloadAsFloat(0f);
                    if (cosine) {
                        // doc vector norm
                        docVectorNorm += Math.pow(payload, 2.0);
                    }
                }
            }
            // dot product
            score += payload * vector.get(i);
        }
        if (cosine) {
            // cosine similarity score
            if (docVectorNorm == 0 || queryVectorNorm == 0) return 0f;
            return score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
        } else {
            // dot product score
            return score;
        }
    }

I'm trying to start with just getting a field from index. But I'm getting error.

Here's the shape of my index.

I've enabled delimited_payload_filter

"settings" : {
    "analysis": {
            "analyzer": {
               "payload_analyzer": {
                  "type": "custom",
                  "tokenizer":"whitespace",
                  "filter":"delimited_payload_filter"
                }
      }
    }
 }

And I have a field called @model_factor to store a vector.

{
    "movies" : {
        "properties" : {
            "@model_factor": {
                            "type": "text",
                            "term_vector": "with_positions_offsets_payloads",
                            "analyzer" : "payload_analyzer"
                     }
        }
    }
}

And this is the shape of the document

{
    "@model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
    "name": "Test 1"
}

Here's how I use the script

{
    "query": {
        "function_score": {
            "query" : {
                "query_string": {
                    "query": "*"
                }
            },
            "script_score": {
                "script": {
        			"inline": "def termInfo = doc['_index']['@model_factor'].get('1', 4);",
                	"lang": "painless",
                	"params": {
                    	"field": "@model_factor",
                    	"vector": [0.1,2.3,-1.6,0.7,-1.3],
                    	"cosine" : true
                    }
				}
            },
            "boost_mode": "replace"
        }
    }
}

And this is the error I got.

"failures": [
      {
        "shard": 2,
        "index": "test",
        "node": "ShL2G7B_Q_CMII5OvuFJNQ",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "caused_by": {
            "type": "wrong_method_type_exception",
            "reason": "wrong_method_type_exception: cannot convert MethodHandle(List,int)int to (Object,String)String"
          },
          "script_stack": [
            "termInfo = doc['_index']['@model_factor'].get('1',4);",
            "              ^---- HERE"
          ],
          "script": "def termInfo = doc['_index']['@model_factor'].get('1',4);",
          "lang": "painless"
        }
      }
    ]

The question is how do I access the index field to get @model_factor in painless scripting?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.