Calculations with values of different documents in painless query

alexfritz · September 20, 2020, 7:31pm

Hello

I would like to use a painless script to do some calculations. The problem is, the values are stored in different documents as time series. For example the index has a mapping of 2 fields fieldA and fieldB. There can be documents with values for both fields but usually documents with only one value for one field.
How can I use a painless script to add values of two documents e.g. sum up the latest document for each field?

Example:
Document 1:

{
"@timestamp": "2020-09-20T19:19:05.000Z",
"fieldA": 10
}

Document 2:

{
"@timestamp": "2020-09-20T19:19:06.000Z",
"fieldB": 20
}

Now I want to do something like this:

{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "now/d",
              "lt": "now/d+1d",
              "time_zone": "+02:00"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "_source": false,
  "script_fields": {
    "@script": {
      "script": {
        "lang": "painless",
        "source": "doc['fieldA'].value + doc['fieldB'].value"
      }
    }
  }
}

The result would be 20, because it is the most recent document.
How can I use the painless script to find the most recent values for both fields?

xeraa · September 21, 2020, 12:49am

Should fieldA and fieldB really be strings? If yes, you'll need to use the keyword field for the calculation (and you don't want to turn on fielddata instead). Though maybe numbers are the better choice anyway?
You could either use a check doc[<field>].size()==0 for missing fields or set it explicitly to 0 with null_value in the mapping.

alexfritz · September 21, 2020, 8:05am

Oh, sorry, my bad. The numbers are stored as numbers of course. I will edit the first post.

xeraa · September 21, 2020, 5:37pm

Ok, then you need to work around missing fields. Does using null_value for 0 make sense? Otherwise you'll need a conditional.

alexfritz · September 21, 2020, 6:43pm

Well I rather want to have the possibility that the query finds both most recent documents than excluding one of it because the calculation is useless with only one field value available...

How can a query request both independent documents and then running the script on the result?

The ugly workaround would be to parse the script query, extract the field name and do separate multi search queries for each field. Then interpret the painless script in the backend to return the result. But this would make the use of the painless script completely useless.

xeraa · September 22, 2020, 2:34am

In your initial query, there is a "size": 1 and you also write about "the most recent document". That could be covered by a search operation.

Now it's about "both independent documents", which would need an aggregation.

It's a bit unclear what is the goal here.

alexfritz · September 22, 2020, 4:34pm

Let's say there are several documents in the index. Some of them have only fieldA some of them only fieldB. They could be sensor values measured at slightly different timestamps. I want to calculate the sum or run other mathematical operations with the most recent values (within a time range) of fieldA and fieldB by using the script function.

How can I do this?

xeraa · September 23, 2020, 4:27am

I've tried to simplify it to the core parts — what about this:

PUT test/_doc/1
{
"@timestamp": "2020-09-20T19:19:05.000Z",
"fieldA": 10
}

PUT test/_doc/2
{
"@timestamp": "2020-09-20T19:19:06.000Z",
"fieldB": 20
}

GET test/_search
{
  "size": 0,
  "aggs": {
    "my_sum": {
      "scripted_metric": {
        "init_script": "state.elements = []", 
        "map_script": "state.elements.add(doc['fieldA'].size()>0 ? doc.fieldA.value : 0);state.elements.add(doc['fieldB'].size()>0 ? doc.fieldB.value : 0)",
        "combine_script": "long sum = 0; for (t in state.elements) { sum += t } return sum",
        "reduce_script": "long sum = 0; for (t in states) { sum += t } return sum"
      }
    }
  }
}

And it does return:

"aggregations" : {
  "my_sum" : {
    "value" : 30
  }

alexfritz · September 23, 2020, 8:49am

Thanks! I will try it.
Will it always find the most recent document based on @timestamp? How can I restrict it to a specific time range?

xeraa · September 23, 2020, 11:54pm

Add your initial "query": { between size and aggs to filter down to the documents you want to aggregate on.

alexfritz · September 29, 2020, 10:15pm

This is the solution! Thank you!

system · October 27, 2020, 10:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to write a json/painless script that ignores some values? Elasticsearch	2	384	January 4, 2019
New to Painless, can someone please help? Elasticsearch painless	8	1218	August 5, 2020
Using Painless Script with Lucene Query in ElasticSearch Elasticsearch	2	1149	October 31, 2018
Painless script to find the difference between two timestamp values Elasticsearch painless	4	448	May 10, 2023
Painless - how to access "@timestamp" field in script Elasticsearch	4	6697	May 31, 2018

Calculations with values of different documents in painless query

Related topics