Cannot assign numeric values in script on update by query

OlivierPetri · April 27, 2018, 9:48am

Hi there,

I am using Elasticsearch 6.2 and am trying to assign a nummeric value in a scripted update by query like so:

POST aarhuis/_update_by_query
{
  "script" : {
      "source" : "doc['SUMAB'] = doc['CNTA'] + doc['CNTB']",
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

All three fields are defined as being nummeric ( "CNTA": { "type": "integer" } and so on). But using the doc notation, with, or without .value extension, I keep getting errors, null_pointer_exception to be precise.

I have tried all sorts of combinations, and the npx is signalled at different places:

  "doc['SUMBA'].value = doc['CNTA'].value + doc['CNTB'].value",
  "    ^---- HERE"


  "doc['SUMBA'] = doc['CNTA'] + doc['CNTB']",
  "                   ^---- HERE"

If I try checking for null, I'll get npx in the null check:

if(doc['CNTA'] != null && doc['CNTB'] != null) {doc['SUMAB'] = doc['CNTA'] + doc['CNTB'];}

          "if(doc['CNTA'] != null && doc['CNTB'] != null) {",
          "       ^---- HERE"

But the best is yet to come!

First, when i use the expression

doc['CNTA'].value + doc['CNTB'].value

in a scripted field, it just works as expected!

But worst, when I change for

ctx._source.SUMAB = ctx._source.CNTA + ctx._source.CNTB

in the update query, the query will pass without failure, but it will treat both fields as text and concatenate them instead of calculating, giving me 33 for 3 + 3 and not 6.

I feel I'm making some stupid mistake somewhere (or the documentation is missing some elementary hint), so I do not file this as a bug report for now. But having spend more than a full day on this I am pretty close to madness. "Help! I need somebody..."

val · April 27, 2018, 11:36am

Are you sure that all documents have the CNTA and CNTB fields?

in a scripted field, it just works as expected!

Yes, because the 10 first documents that come back probably have both the CNTA and CNTB fields, hence the script works. The update-by-query, however, will work on the full document set.

But worst, when I change for
ctx._source.SUMAB = ctx._source.CNTA + ctx._source.CNTB

Since you're accessing the source document, that means that in your source document those fields were specified using a string instead of an integer value.

What you could do is

 ctx._source.SUMAB = Integer.parseInt(ctx._source.CNTA) + Integer.parseInt(ctx._source.CNTB)

OlivierPetri · April 27, 2018, 11:44am

Yes, I am. And all are set to values > 0.

When I do somehting like

POST aarhuis/_update_by_query
{
  "script" : {
      "source" : "try {ctx._source.SUMBA = doc['CNTA'].values + doc['CNTB'].values;} catch(Exception x){ctx._source.SUMBA = 0;}",
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

Then SUMBA will be set to zero on every document. So maybe it IS a bug? But I want to make sure I did not have some obvious syntax mistake in my code. The variations

doc['CNTA']
doc['CNTA'].value
ctx.CNTA
ctx._source.CNTA

Are not very well explained in the documentation and quite confusing. It appears not all are valid in the context of an update by query, but at least one should work. Or some other.

val · April 27, 2018, 11:48am

If your CNTA and CNTB fields are indeed integers disguised as strings, have you tried my last suggestion?

OlivierPetri · April 27, 2018, 11:53am

Thank you for your response, I was a bit fast on my first reply. Your workaround solved my problem.

What I do not understand is, the

in your source document those fields were specified using a string instead of an integer value

part. I defined the mapping beforehand as

"CNTA" : { "type" : "integer" }

and so on, then imported a CSV file via logstash.
When I query the mapping of my index, it will be displayed as integer, so i somewhat naivly expect it to indeed be an integer.

And: Shouldn't the doc notation work anyway?

val · April 27, 2018, 11:56am

Specifying this in your mapping

"CNTA" : { "type" : "integer" }

Simply tells Elasticsearch to treat whatever comes in the CNTA field as an integer.

However, if your source document contains this:

{ "CNTA": "100" }

then it is a valid integer and ES will treat it as such. But what ES won't do is transform "100" (a string) into 100 (an integer) inside the source document. That's why when you access ctx._source.CNTA you get a string, i.e. because that's what your source document contained when it was indexed into ES.

So you need to check inside your CSV file whether the CNTA field is enclosed within double quotes or not, if that's the case, you need to modify your csv Logstash filter to convert that value to an integer before the document reaches ES.

OlivierPetri · April 27, 2018, 11:59am

Ok, thank you very much. This was very helfpul.

val · April 27, 2018, 12:02pm

Awesome, I'm glad this helped.

system · May 25, 2018, 12:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.