Calculations with values of different documents in painless query

Hello

I would like to use a painless script to do some calculations. The problem is, the values are stored in different documents as time series. For example the index has a mapping of 2 fields fieldA and fieldB. There can be documents with values for both fields but usually documents with only one value for one field.
How can I use a painless script to add values of two documents e.g. sum up the latest document for each field?

Example:
Document 1:

{
"@timestamp": "2020-09-20T19:19:05.000Z",
"fieldA": 10
}

Document 2:

{
"@timestamp": "2020-09-20T19:19:06.000Z",
"fieldB": 20
}

Now I want to do something like this:

{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "now/d",
              "lt": "now/d+1d",
              "time_zone": "+02:00"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "_source": false,
  "script_fields": {
    "@script": {
      "script": {
        "lang": "painless",
        "source": "doc['fieldA'].value + doc['fieldB'].value"
      }
    }
  }
}

The result would be 20, because it is the most recent document.
How can I use the painless script to find the most recent values for both fields?

  1. Should fieldA and fieldB really be strings? If yes, you'll need to use the keyword field for the calculation (and you don't want to turn on fielddata instead). Though maybe numbers are the better choice anyway?
  2. You could either use a check doc[<field>].size()==0 for missing fields or set it explicitly to 0 with null_value in the mapping.

Oh, sorry, my bad. The numbers are stored as numbers of course. I will edit the first post.

Ok, then you need to work around missing fields. Does using null_value for 0 make sense? Otherwise you'll need a conditional.

Well I rather want to have the possibility that the query finds both most recent documents than excluding one of it because the calculation is useless with only one field value available...

How can a query request both independent documents and then running the script on the result?

The ugly workaround would be to parse the script query, extract the field name and do separate multi search queries for each field. Then interpret the painless script in the backend to return the result. But this would make the use of the painless script completely useless.

In your initial query, there is a "size": 1 and you also write about "the most recent document". That could be covered by a search operation.

Now it's about "both independent documents", which would need an aggregation.

It's a bit unclear what is the goal here.

Let's say there are several documents in the index. Some of them have only fieldA some of them only fieldB. They could be sensor values measured at slightly different timestamps. I want to calculate the sum or run other mathematical operations with the most recent values (within a time range) of fieldA and fieldB by using the script function.

How can I do this?

I've tried to simplify it to the core parts — what about this:

PUT test/_doc/1
{
"@timestamp": "2020-09-20T19:19:05.000Z",
"fieldA": 10
}

PUT test/_doc/2
{
"@timestamp": "2020-09-20T19:19:06.000Z",
"fieldB": 20
}

GET test/_search
{
  "size": 0,
  "aggs": {
    "my_sum": {
      "scripted_metric": {
        "init_script": "state.elements = []", 
        "map_script": "state.elements.add(doc['fieldA'].size()>0 ? doc.fieldA.value : 0);state.elements.add(doc['fieldB'].size()>0 ? doc.fieldB.value : 0)",
        "combine_script": "long sum = 0; for (t in state.elements) { sum += t } return sum",
        "reduce_script": "long sum = 0; for (t in states) { sum += t } return sum"
      }
    }
  }
}

And it does return:

"aggregations" : {
  "my_sum" : {
    "value" : 30
  }
1 Like

Thanks! I will try it.
Will it always find the most recent document based on @timestamp? How can I restrict it to a specific time range?

Add your initial "query": { between size and aggs to filter down to the documents you want to aggregate on.

1 Like

This is the solution! Thank you! :slightly_smiling_face:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.