Using groovy script within function_score and access field payload

tpraizler · September 6, 2015, 2:43pm

I am trying to understand what is the best way to access the payload while using groovy script.

I want to do something like this:

sum = 0;
categories  = doc['category'].values;
for (category in categories){
   sum += category.payload
}
return sum;

Based on elasticsearch documentation I can do it with _index which is not what I am looking for, _index give access to statistics in the scope of the index, and not for a specific document.

I want to go over every document, and take its payload and multiply it with some constant.

When doing this: _index['category'].get('term', _PAYLOADS) I will get a list of all payloads of "term", which is not what I am looking for.

Is there a way to access a field payload from the scope of a document?

a2tirb · September 7, 2015, 4:06pm

Payloads are per occurrence of a term in a document so one term can have several payloads in one document. _index['category'].get('term', _PAYLOADS) will give you an iterator over the payloads and should have as many elements as there are occurrences of this term in a document.This will always be an iterator even if the term occurs only once.
What do you mean by When doing this: _index['category'].get('term', _PAYLOADS) I will get a list of all payloads of "term"? Do you get back more than expected?

tpraizler · September 7, 2015, 7:46pm

Oh! so maybe I got this wrong.
Let me try to put it into an example.

If I have the following 3 documents:

doc1: 
{
   "id": 1,
   "categories": ["1000|0.1","1001|0.2"]
}

doc2:
{
   "id": 2,
   "categories": ["1000|0.6","1001|0.7"]
}

 doc3:
{
   "id": 3,
   "categories": ["1000|0.4","1001|0.5"]
}

If I will do this:

_index['categories'].get('1000', _PAYLOADS)

What I understood is that I will get an iterator on:

[0.1,0.6,0.4]

3 times, 1 for each doc.
And not an iterator on:

[0.1] (in case of doc1)
[0.6] (in case of doc2)
[0.4] (in case of doc3)

Is that correct? or I got it wrong?

Thanks!

a2tirb · September 8, 2015, 9:02am

You will get an iterator for each doc, each containing only the payloads for this document. There is actually no way to get all payloads for all documents at the same time.

Here is an example:

DELETE testidx

PUT testidx
{
  "mappings": {
    "doc": {
      "properties": {
        "categories": {
          "type": "string",
          "analyzer": "payload"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "payload": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "delimited_payload_filter"
          ]
        }
      }
    }
  }
}

POST testidx/doc/
{
   "id": 1,
   "categories": ["1000|0.1","1001|0.2"]
}

POST testidx/doc/
{
   "id": 2,
   "categories": ["1000|0.2","1001|0.2"]
}

POST testidx/doc/
{
   "id": 3,
   "categories": ["1000|0.3","1001|0.2"]
}

GET testidx/doc/_search
{
  "fields": [
    "_source"
  ],
  "script_fields": {
    "payloads": {
      "script": "payloads = []; positions = _index['categories'].get('1000', _PAYLOADS); for(pos in positions){payloads.add(pos.payloadAsFloat(0))}; payloads"
    }
  }
}

yields:

"hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "testidx",
            "_type": "doc",
            "_id": "AU-sK5gxjoOwWjOATroI",
            "_score": 1,
            "_source": {
               "id": 3,
               "categories": [
                  "1000|0.3",
                  "1001|0.2"
               ]
            },
            "fields": {
               "payloads": [
                  [
                     0.3
                  ]
               ]
            }
         },
         {
            "_index": "testidx",
            "_type": "doc",
            "_id": "AU-sJk0KjoOwWjOATrk2",
            "_score": 1,
            "_source": {
               "id": 1,
               "categories": [
                  "1000|0.1",
                  "1001|0.2"
               ]
            },
            "fields": {
               "payloads": [
                  [
                     0.1
                  ]
               ]
            }
         },
         {
            "_index": "testidx",
            "_type": "doc",
            "_id": "AU-sK6BajoOwWjOATroJ",
            "_score": 1,
            "_source": {
               "id": 2,
               "categories": [
                  "1000|0.2",
                  "1001|0.2"
               ]
            },
            "fields": {
               "payloads": [
                  [
                     0.2
                  ]
               ]
            }
         }
      ]
   }

tpraizler · September 9, 2015, 3:15pm

This is awesome!! thanks!!!

Do you think it make sense to use _index to access payload at scale?
Meaning, I want to be able to query my index, running a groovy script which will use the _index on every query.
From what I read _index is not very performant.

So is it safe to use _index? or there is a more performant option?

And thanks again!!

Topic		Replies	Views
ElasticSearch: access document nested value in groovy script Elasticsearch	3	1811	July 5, 2017
Available data in Groovy script_score/script_fields? Elasticsearch	4	1431	July 5, 2017
Access payload from payload_delimiter in scoring script Elasticsearch	1	297	June 16, 2021
Accessing payload in nested object with groovy Elasticsearch elastic-stack-alerting	4	1171	July 6, 2017
How to retrieve payloads as byte arrays in Java scoring script? Elasticsearch	2	491	July 5, 2017

Using groovy script within function_score and access field payload

Related topics